QLoRA: Efficient Finetuning of Quantized LLMs on Consumer GPUs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - This foundational paper introduces QLoRA, detailing 4-bit NormalFloat (NF4) quantization, Double Quantization (DQ), and Paged Optimizers for memory-efficient fine-tuning of large language models.
LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021arXiv preprint arXiv:2106.09685DOI: 10.48550/arXiv.2106.09685 - Presents the original Low-Rank Adaptation (LoRA) method, a parameter-efficient fine-tuning technique that serves as the basis for QLoRA's adapter-based training.