QLoRA: Efficient Finetuning of Quantized LLMs on Consumer GPUs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2305.14314 - Describes a method that uses 4-bit quantization for efficient finetuning of large language models, showcasing the practical utility of quantization.
bitsandbytes Library Documentation, Tim Dettmers and contributors, 2024 (bitsandbytes-foundation) - Official documentation for the Python library providing 8-bit and 4-bit quantization and optimizers for PyTorch, widely used for memory-efficient LLM deployment.