Full fine-tuning updates every parameter in a Large Language Model, demanding substantial computational resources and memory. This often makes adapting the largest models impractical without specialized hardware. This chapter presents Parameter-Efficient Fine-tuning (PEFT) methods as a resource-conscious alternative.
PEFT techniques modify only a small fraction of the model's parameters, drastically reducing computational overhead and storage requirements while often achieving performance comparable to full fine-tuning on specific tasks.
You will study the principles and implementations of several prominent PEFT approaches:
We will analyze the mechanics behind each technique, compare their respective advantages and trade-offs, and walk through practical implementations using common libraries. By the end of this chapter, you will understand how to select and apply appropriate PEFT methods to adapt LLMs efficiently.
4.1 Rationale for Parameter Efficiency
4.2 Low-Rank Adaptation (LoRA)
4.3 Quantized Low-Rank Adaptation (QLoRA)
4.4 Adapter Modules
4.5 Prompt Tuning
4.6 Prefix Tuning
4.7 Comparison of PEFT Techniques
4.8 Implementation with Hugging Face PEFT Library
4.9 Hands-on Practical: Fine-tuning with LoRA
4.10 Hands-on Practical: Fine-tuning with QLoRA
© 2025 ApX Machine Learning