llama.cpp: Inference of LLaMA model in pure C/C++, Georgi Gerganov and the llama.cpp community, 2023 - The open-source project that introduced the GGUF format and provides tools for model conversion and inference.
Hugging Face Transformers Documentation, Hugging Face team, 2023 - Official documentation for the transformers library, a standard for loading and using various models, including quantized ones.
TimDettmers/bitsandbytes: 8-bit and 4-bit quantization for PyTorch, Tim Dettmers and bitsandbytes contributors, 2023 - The GitHub repository for bitsandbytes, a library enabling efficient 8-bit and 4-bit quantization for PyTorch models, integrated with transformers.