LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen, 2021arXiv preprint arXiv:2106.09685DOI: 10.48550/arXiv.2106.09685 - The original research paper introducing Low-Rank Adaptation (LoRA), detailing its mathematical formulation and core mechanism for adapting large language models.
PeftModel.merge_and_unload, Hugging Face, 2024 (Hugging Face) - Official documentation for the Hugging Face PEFT library, providing practical details and usage of the merge_and_unload function for integrating LoRA weights into a base model.
QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - Introduces QLoRA, a method for fine-tuning quantized models, which highlights the precision considerations relevant to merging LoRA weights with quantized base models.