Low-Rank Adaptation (LoRA) can be applied to real models using the Hugging Face PEFT (Parameter-Efficient Fine-Tuning) library. This library provides a high-level API that simplifies the process of injecting LoRA adapters into existing transformer models with just a few lines of code. It handles the complex work of modifying the model's architecture and freezing the appropriate weights, allowing you to focus on configuring the training.
PEFT Library WorkflowThe core idea behind using PEFT is to take a pre-trained base model, define a configuration for your LoRA adapters, and then combine them to create a new, trainable model. This new model keeps the original weights frozen and only trains the small, injected adapter layers.
The process involves three main steps:
LoraConfig object to specify the hyperparameters for the adapter layers, such as their rank (r) and which parts of the base model to modify.PeftModel: Use the get_peft_model function to wrap the base model with the LoRA adapters according to your configuration.Let's examine each step in detail.
LoraConfigThe LoraConfig class is the control center for your LoRA implementation. It allows you to define all the important parameters for the adapter layers.
Here are the most significant arguments you will use:
r: This integer represents the rank of the low-rank update matrices ( and ). It directly controls the number of trainable parameters. A smaller r results in fewer parameters and faster training but may capture less task-specific information. A larger r increases model capacity at the cost of more parameters. Common values for r range from 4 to 64.lora_alpha: This is the scaling factor for the LoRA activations, behaving like a learning rate for the adapters. The LoRA update is scaled by lora_alpha / r, so adjusting this value can impact the magnitude of the changes made by the adapters. A common practice is to set lora_alpha to be twice the value of r.target_modules: A list of strings specifying which modules in the base model architecture to apply LoRA to. For transformer models, this is typically the linear layers within the attention mechanism, such as query, key, and value. For example, ["q_proj", "v_proj"]. Identifying the correct module names requires inspecting the base model's architecture.lora_dropout: A float value for the dropout probability to apply to the LoRA layers. This serves as a regularization technique to prevent overfitting of the adapter weights.task_type: Specifies the type of task you are fine-tuning for. For example, for text generation models, you would set this to TaskType.CAUSAL_LM. This helps the PEFT library configure the model's forward pass correctly.from peft import LoraConfig, TaskType
# Example LoRA configuration for a causal language model
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.1,
bias="none",
task_type=TaskType.CAUSAL_LM
)
PeftModelOnce you have your base model loaded and the LoraConfig defined, you can combine them using the get_peft_model function. This function takes the base model and the configuration as input and returns a PeftModel object. This returned model is ready for training, with all base model weights frozen and only the new LoRA adapter weights marked as trainable.
The following diagram illustrates this workflow.
The
get_peft_modelfunction takes a frozen base model and aLoraConfigto produce aPeftModel, where only the small, injected LoRA adapter matrices are trainable.
Let's walk through a complete example of setting up a model for LoRA fine-tuning. We will use the meta-llama/Llama-2-7b-chat-hf model as our base, but the same principles apply to any transformer model on the Hugging Face Hub.
First, ensure you have the necessary libraries installed:
pip install torch transformers peft accelerate bitsandbytes
Now, let's write the Python code to prepare our model.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
# 1. Define the model ID and load the tokenizer
model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# 2. Configure quantization to load the model in 4-bit
# This is a memory-saving technique that we will explore more in the QLoRA section
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
# 3. Load the base model with quantization
base_model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto" # Automatically maps layers to available hardware
)
# 4. Prepare the model for k-bit training (optional but recommended)
base_model = prepare_model_for_kbit_training(base_model)
# 5. Define the LoRA configuration
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
# 6. Create the PeftModel
peft_model = get_peft_model(base_model, lora_config)
# 7. Print the percentage of trainable parameters
def print_trainable_parameters(model):
"""
Prints the number of trainable parameters in the model.
"""
trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || "
f"trainable%: {100 * trainable_params / all_param:.2f}"
)
print_trainable_parameters(peft_model)
When you run this code, the output of print_trainable_parameters will reveal the power of PEFT. For a 7-billion-parameter model, the output would look something like this:
trainable params: 8,388,608 || all params: 3,508,801,536 || trainable%: 0.24
This is the central benefit of LoRA. We have successfully prepared a 7B parameter model for fine-tuning while only needing to train less than 0.25% of the total parameters. The memory requirements for the optimizer states and gradients are drastically reduced, making it possible to run the training process on a single, consumer-grade GPU.
Once you have your PeftModel, you can use it with the standard Hugging Face Trainer API just as you would for full fine-tuning. The Trainer will automatically handle the gradient updates for only the trainable LoRA parameters. The small set of adapter weights can then be saved and loaded independently of the large base model, making it easy to manage multiple specialized versions of the same foundation model.
Cleaner syntax. Built-in debugging. Production-ready from day one.
Built for the AI systems behind ApX Machine Learning
Was this section helpful?
© 2026 ApX Machine LearningEngineered with