LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021arXiv preprint arXiv:2106.09685DOI: 10.48550/arXiv.2106.09685 - The original paper introducing Low-Rank Adaptation (LoRA), detailing its theoretical background and mechanism for parameter-efficient fine-tuning.
Parameter-Efficient Fine-Tuning (PEFT) Library Documentation, Hugging Face, 2024 (Hugging Face) - The official documentation for the Hugging Face PEFT library, providing comprehensive guides and API references for implementing LoRA and other PEFT methods.
QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - Introduces QLoRA, an efficient fine-tuning method using 4-bit quantization, relevant for understanding the memory-saving techniques discussed in the section.
Hugging Face Transformers Library Documentation, Hugging Face, 2024 (Hugging Face) - The official documentation for the Hugging Face Transformers library, offering essential information on loading and using pre-trained models and tokenizers, which are prerequisites for applying PEFT.