Okay, let's translate the theoretical understanding of Low-Rank Adaptation (LoRA) into a practical implementation. This hands-on section guides you through adapting a pre-trained foundation model for a few-shot task using the peft
library, which simplifies the application of various Parameter-Efficient Fine-Tuning techniques. We assume you have a working Python environment with PyTorch and the Hugging Face ecosystem (transformers
, datasets
, peft
) installed. Access to a GPU is highly recommended for efficient training, even with parameter-efficient methods.
Our goal is to take a large, pre-trained model (frozen) and train only the lightweight LoRA adapters on a small dataset representing a new task.
First, ensure the necessary libraries are installed:
# pip install transformers datasets peft torch accelerate
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, TaskType
import os
# Configuration (replace with your specifics)
BASE_MODEL_NAME = "bert-base-uncased" # Example model
DATASET_NAME = "imdb" # Example dataset for classification
NUM_CLASSES = 2 # Example: positive/negative sentiment
FEW_SHOT_SAMPLES = 16 # K value for K-shot learning (per class)
OUTPUT_DIR = "./lora-bert-few-shot-adapter"
LEARNING_RATE = 1e-4
NUM_EPOCHS = 5
LORA_R = 8 # LoRA rank
LORA_ALPHA = 16 # LoRA scaling factor
LORA_DROPOUT = 0.1
# Specify target modules based on model architecture (e.g., for BERT)
LORA_TARGET_MODULES = ["query", "value"]
# Ensure device is set correctly (GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# Create output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)
We define constants for the base model, dataset, LoRA parameters, and training hyperparameters. Selecting appropriate LORA_TARGET_MODULES
is important; for many Transformer models, applying LoRA to the query and value projection matrices within the self-attention mechanism is effective. You might need to inspect the model architecture (print(model)
) to identify the correct module names.
We load the pre-trained model and its corresponding tokenizer. The model will serve as the base, with its original weights frozen during adaptation.
# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(
BASE_MODEL_NAME,
num_labels=NUM_CLASSES
)
# Freeze all parameters of the base model
for param in model.parameters():
param.requires_grad = False
print(f"Loaded base model: {BASE_MODEL_NAME}")
For few-shot learning, we need a small support set for training the adapters. We'll simulate this by sampling a small number of examples from a standard dataset. In a real-world scenario, this would be your actual limited task-specific data.
# Load the dataset
dataset = load_dataset(DATASET_NAME)
# Create a small, balanced few-shot training subset
train_dataset_full = dataset['train'].shuffle(seed=42)
sampled_train_indices = []
for label in range(NUM_CLASSES):
label_indices = [
i for i, ex in enumerate(train_dataset_full)
if ex['label'] == label
][:FEW_SHOT_SAMPLES]
sampled_train_indices.extend(label_indices)
few_shot_train_dataset = train_dataset_full.select(sampled_train_indices).shuffle(seed=42)
# Use a portion of the original test set for evaluation
eval_dataset = dataset['test'].shuffle(seed=42).select(range(1000)) # Use a subset for faster eval
# Preprocessing function
def preprocess_function(examples):
return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=128)
# Apply preprocessing
encoded_train_dataset = few_shot_train_dataset.map(preprocess_function, batched=True)
encoded_eval_dataset = eval_dataset.map(preprocess_function, batched=True)
# Format datasets for PyTorch
encoded_train_dataset.set_format("torch", columns=['input_ids', 'attention_mask', 'label'])
encoded_eval_dataset.set_format("torch", columns=['input_ids', 'attention_mask', 'label'])
print(f"Prepared few-shot dataset with {len(encoded_train_dataset)} training samples.")
print(f"Using {len(encoded_eval_dataset)} samples for evaluation.")
This code snippet samples FEW_SHOT_SAMPLES
examples per class from the training set and prepares both training and evaluation datasets by tokenizing the text inputs.
Now, we define the LoRA configuration using LoraConfig
and apply it to our frozen base model using get_peft_model
. This function intelligently modifies the model architecture to include the low-rank adapters in the specified target modules.
# Define LoRA configuration
lora_config = LoraConfig(
r=LORA_R,
lora_alpha=LORA_ALPHA,
target_modules=LORA_TARGET_MODULES,
lora_dropout=LORA_DROPOUT,
bias="none", # Typically 'none', 'all', or 'lora_only'
task_type=TaskType.SEQ_CLS # Specific task type
)
# Apply LoRA to the model
lora_model = get_peft_model(model, lora_config)
# Print trainable parameters
lora_model.print_trainable_parameters()
# Move model to the appropriate device
lora_model.to(device)
The print_trainable_parameters()
method highlights the efficiency of LoRA. You'll observe that the number of trainable parameters is a very small fraction of the total parameters in the original foundation model.
A simplified view of LoRA adaptation. The original weight matrix W is frozen. Trainable low-rank matrices B and A (with rank r≪d,k) are added in parallel. The final output combines the outputs from both branches.
We set up a standard PyTorch training loop. The crucial difference from full fine-tuning is that the optimizer only needs to manage the parameters of the LoRA adapters, which get_peft_model
conveniently marks as trainable.
from torch.utils.data import DataLoader
from transformers import AdamW, get_linear_schedule_with_warmup
import numpy as np
from tqdm.notebook import tqdm # Use tqdm for progress bars
# Create DataLoaders
train_dataloader = DataLoader(encoded_train_dataset, batch_size=8, shuffle=True)
eval_dataloader = DataLoader(encoded_eval_dataset, batch_size=16)
# Optimizer - only optimizes the LoRA parameters
optimizer = AdamW(lora_model.parameters(), lr=LEARNING_RATE)
# Learning rate scheduler
num_training_steps = NUM_EPOCHS * len(train_dataloader)
lr_scheduler = get_linear_schedule_with_warmup(
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=num_training_steps
)
print("Starting LoRA adapter training...")
for epoch in range(NUM_EPOCHS):
lora_model.train() # Set model to training mode
total_loss = 0
progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch+1}/{NUM_EPOCHS}", leave=False)
for batch in progress_bar:
# Move batch to device
batch = {k: v.to(device) for k, v in batch.items()}
# Forward pass
outputs = lora_model(input_ids=batch['input_ids'],
attention_mask=batch['attention_mask'],
labels=batch['label'])
# Calculate loss
loss = outputs.loss
total_loss += loss.item()
# Backward pass
loss.backward()
# Optimizer step
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.set_postfix({'loss': loss.item()})
avg_train_loss = total_loss / len(train_dataloader)
print(f"Epoch {epoch+1} Average Training Loss: {avg_train_loss:.4f}")
# Optional: Evaluation step within the loop (see next section)
# evaluate(lora_model, eval_dataloader, device)
print("Training finished.")
# Save the trained LoRA adapter
lora_model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR) # Save tokenizer too for easy loading
print(f"LoRA adapter saved to {OUTPUT_DIR}")
This loop iterates through the small few-shot dataset for a specified number of epochs, calculating the loss and updating only the LoRA weights (matrices A and B).
After training, we evaluate the performance of the model with the trained LoRA adapters on the held-out evaluation set.
from sklearn.metrics import accuracy_score
def evaluate(model, dataloader, device):
model.eval() # Set model to evaluation mode
all_preds = []
all_labels = []
total_eval_loss = 0
progress_bar = tqdm(dataloader, desc="Evaluating", leave=False)
with torch.no_grad(): # Disable gradient calculations
for batch in progress_bar:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(input_ids=batch['input_ids'],
attention_mask=batch['attention_mask'],
labels=batch['label'])
loss = outputs.loss
total_eval_loss += loss.item()
logits = outputs.logits
predictions = torch.argmax(logits, dim=-1)
all_preds.extend(predictions.cpu().numpy())
all_labels.extend(batch['label'].cpu().numpy())
avg_eval_loss = total_eval_loss / len(dataloader)
accuracy = accuracy_score(all_labels, all_preds)
print(f"Evaluation Loss: {avg_eval_loss:.4f}")
print(f"Evaluation Accuracy: {accuracy:.4f}")
return accuracy, avg_eval_loss
# Perform final evaluation
print("\nPerforming final evaluation...")
evaluate(lora_model, eval_dataloader, device)
This evaluation function calculates the loss and accuracy on the evaluation set, providing a measure of how well the adapter generalized from the few-shot training examples.
You can easily load the base model with the trained LoRA adapter for inference later:
from peft import PeftModel, PeftConfig
# Load the configuration and base model
config = PeftConfig.from_pretrained(OUTPUT_DIR)
base_model = AutoModelForSequenceClassification.from_pretrained(
config.base_model_name_or_path, # Loads the original base model name
num_labels=NUM_CLASSES
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Load the LoRA model (merges adapter into base model)
loaded_lora_model = PeftModel.from_pretrained(base_model, OUTPUT_DIR)
loaded_lora_model.to(device)
loaded_lora_model.eval()
print("Loaded adapted model successfully.")
# Example Inference
text = "This movie was fantastic, great acting and plot!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
with torch.no_grad():
outputs = loaded_lora_model(**inputs)
logits = outputs.logits
predicted_class_id = torch.argmax(logits, dim=-1).item()
print(f"Input text: '{text}'")
print(f"Predicted class ID: {predicted_class_id}") # Map ID to label name if needed
This practical exercise demonstrates the core workflow of adapting a foundation model using LoRA: load, freeze, configure PEFT, train adapters on few-shot data, and evaluate.
r
, alpha
, target_modules
).r
is a critical parameter. Higher r
allows capturing more complex adaptations but increases trainable parameters. lora_alpha
acts as a scaling factor for the LoRA updates; often set to r
or 2*r
. Experimentation is needed to find optimal values.This hands-on example provides a starting point. You can extend this by experimenting with different foundation models (Vision Transformers, other LLMs), exploring different PEFT techniques available in the peft
library (like Prefix Tuning or Adapters), and applying it to more complex few-shot datasets and tasks.
© 2025 ApX Machine Learning