Adapting a pre-trained model requires selecting a fine-tuning strategy. This choice directly influences the computational resources needed, the time to train the model, and the characteristics of the final artifact produced. Fine-tuning approaches range across a spectrum defined by resource intensity and the number of parameters modified, with two primary methods representing the extremes.
Full parameter fine-tuning, often just called "fine-tuning," is the most direct method. In this approach, you load a pre-trained model and continue the training process on your custom dataset, updating every single weight and bias in the model. Think of it as taking the entire neural network and nudging all of its connections to better align with your specific task.
This strategy is powerful because it gives the model maximum flexibility to adapt to the new data. If your task's data distribution is significantly different from the pre-training data, allowing all parameters to change can lead to higher performance. However, this comes with substantial costs:
bfloat16 weights).Full fine-tuning is most suitable when you have access to considerable computational resources and your goal is to achieve the highest possible performance on a single, well-defined task.
Parameter-Efficient Fine-Tuning (PEFT) methods offer a practical alternative to the resource-heavy demands of full fine-tuning. The central idea behind PEFT is to freeze most of the pre-trained model's parameters and only train a very small number of new or existing parameters. This dramatically reduces the memory and computational footprint of the training process.
Instead of modifying the entire model, you inject small, trainable modules or "adapters" into the original architecture. Only these adapters, which might constitute less than 0.1% of the total parameter count, are updated during training. The original weights of the foundation model remain untouched.
This approach provides several advantages:
The diagram below illustrates the fundamental difference between these two strategies.
A comparison of fine-tuning approaches. Full fine-tuning modifies all weights, resulting in a new, large model. PEFT modifies only small, added modules, keeping the base model frozen.
The choice between full fine-tuning and PEFT involves a set of trade-offs. The following table summarizes the main differences to help guide your decision.
| Feature | Full Fine-Tuning | Parameter-Efficient Fine-Tuning (PEFT) |
|---|---|---|
| Parameters Updated | All (100%) | A small subset (< 1%) |
| GPU Memory Requirement | Very High | Low |
| Storage Cost | High (a full copy of the model) | Low (only the small adapter weights) |
| Training Speed | Slower | Faster |
| Catastrophic Forgetting | Higher risk | Lower risk |
| Task Portability | One model per task | One base model, many lightweight adapters |
PEFT is not a single technique but a family of methods. The most popular among these is Low-Rank Adaptation (LoRA), which involves injecting trainable low-rank matrices into the Transformer layers. Other methods include Adapter Tuning, which adds new bottleneck layers, and Prefix-Tuning, which adds trainable prefixes to the input sequence. We will implement LoRA in detail in Chapter 4.
Ultimately, your selection of a fine-tuning strategy will depend on your project's constraints and objectives. If you have limited hardware and need to support multiple tasks, PEFT is an excellent choice. If you require maximum performance for a single application and have access to sufficient computational power, full fine-tuning may be the better path. The subsequent chapters in this course will provide you with the practical skills to implement both.
Cleaner syntax. Built-in debugging. Production-ready from day one.
Built for the AI systems behind ApX Machine Learning
Was this section helpful?
© 2026 ApX Machine LearningEngineered with