LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021arXiv preprint arXiv:2106.09685DOI: 10.48550/arXiv.2106.09685 - Explains the LoRA method, its architecture, and hyperparameter selections, important for diagnosing configuration and performance issues.
QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - Presents QLoRA, describing its 4-bit quantization, NF4 datatype, and paged optimizers, all relevant for debugging QLoRA-specific memory and performance issues.
Parameter-Efficient Fine-tuning (PEFT) library, Hugging Face, 2024 (Hugging Face) - Provides official guides on PeftConfig, target_modules, LoRA/QLoRA integration, and best practices relevant for debugging configuration, integration, and memory issues.
Parameter-Efficient Transfer Learning for NLP, Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly, 2019arXiv preprint arXiv:1902.00751DOI: 10.48550/arXiv.1902.00751 - Original paper on Adapter tuning, discussing adapter architecture and placement within models, which helps understand configuration issues for adapter implementations.