LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021arXiv preprint arXiv:2106.09685DOI: 10.48550/arXiv.2106.09685 - Introduces the LoRA method, explaining how it reduces trainable parameters and associated resource needs for fine-tuning.
QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023arXiv preprint arXiv:2305.14314DOI: 10.48550/arXiv.2305.14314 - Details 4-bit NormalFloat (NF4) quantization, double quantization, and paged optimizers, which reduce VRAM requirements for fine-tuning.
Parameter-Efficient Fine-Tuning (PEFT) library, Hugging Face, 2024 (Hugging Face) - Official guide for the peft library, offering practical information on implementation, API usage, and compatibility with Hugging Face models.
Accelerate training with 🤗 Accelerate, Hugging Face, 2024 - Official documentation for Hugging Face Accelerate, a library that assists with device placement, mixed precision, and distributed training for PyTorch.