QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - The foundational paper introducing QLoRA, detailing 4-bit NormalFloat (NF4) quantization, Double Quantization, and Paged Optimizers for efficient LLM finetuning.
LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021arXivDOI: 10.48550/arXiv.2106.09685 - Introduces the Low-Rank Adaptation (LoRA) method, which is the basis for QLoRA, demonstrating parameter-efficient adaptation of large language models.
bitsandbytes GitHub Repository, bitsandbytes-foundation, 2024 - The official software repository for bitsandbytes, a library providing optimized CUDA functions for 8-bit and 4-bit quantization, including NF4, Double Quantization, and Paged Optimizers used in QLoRA.
Parameter-Efficient Fine-tuning (PEFT) library documentation, Hugging Face, 2024 - Hugging Face's official documentation for the PEFT library, offering guides and API references for implementing parameter-efficient fine-tuning methods like QLoRA.