The previous chapter covered full parameter fine-tuning, a method that updates every weight in the model. While effective, this approach becomes computationally prohibitive as model sizes grow into the billions of parameters. The high demand for GPU memory and processing power makes full fine-tuning inaccessible for many practical applications.
This chapter introduces Parameter-Efficient Fine-Tuning (PEFT), a collection of techniques designed to adapt large models with significantly fewer computational resources. We will concentrate on Low-Rank Adaptation (LoRA), which freezes the original model weights and injects smaller, trainable matrices. Instead of updating the massive original weight matrix , LoRA learns a low-rank update, , where the number of parameters in and is much smaller than in . You will learn to implement this using the Hugging Face PEFT library.
We will also examine how quantization further reduces the memory footprint, leading to methods like QLoRA. The chapter concludes with a comparative analysis of PEFT and full fine-tuning, clarifying the trade-offs in performance and resource usage. By completing this chapter, you will have the practical skills to apply modern, efficient tuning methods to very large models on consumer-grade hardware.
4.1 Introduction to Parameter-Efficient Fine-Tuning
4.2 Low-Rank Adaptation (LoRA): Theory and Operation
4.3 Implementing LoRA with the PEFT Library
4.4 Quantization and its effect on Fine-Tuning (QLoRA)
4.5 Other PEFT Methods: A Brief Survey
4.6 Comparing PEFT and Full Fine-Tuning Trade-offs
4.7 Hands-on Practical: Fine-Tuning with LoRA
© 2026 ApX Machine LearningEngineered with