All Courses

Hands-on Practical: Adapting a Foundation Model using LoRA

Okay, let's translate the theoretical understanding of Low-Rank Adaptation (LoRA) into a practical implementation. This hands-on section guides you through adapting a pre-trained foundation model for a few-shot task using the peft library, which simplifies the application of various Parameter-Efficient Fine-Tuning techniques. We assume you have a working Python environment with PyTorch and the Hugging Face ecosystem (transformers, datasets, peft) installed. Access to a GPU is highly recommended for efficient training, even with parameter-efficient methods.

Our goal is to take a large, pre-trained model (frozen) and train only the lightweight LoRA adapters on a small dataset representing a new task.

1. Setup and Preliminaries

First, ensure the necessary libraries are installed:

# pip install transformers datasets peft torch accelerate
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, TaskType
import os

# Configuration (replace with your specifics)
BASE_MODEL_NAME = "bert-base-uncased" # Example model
DATASET_NAME = "imdb" # Example dataset for classification
NUM_CLASSES = 2 # Example: positive/negative sentiment
FEW_SHOT_SAMPLES = 16 # K value for K-shot learning (per class)
OUTPUT_DIR = "./lora-bert-few-shot-adapter"
LEARNING_RATE = 1e-4
NUM_EPOCHS = 5
LORA_R = 8 # LoRA rank
LORA_ALPHA = 16 # LoRA scaling factor
LORA_DROPOUT = 0.1
# Specify target modules based on model architecture (e.g., for BERT)
LORA_TARGET_MODULES = ["query", "value"]

# Ensure device is set correctly (GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Create output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)

We define constants for the base model, dataset, LoRA parameters, and training hyperparameters. Selecting appropriate LORA_TARGET_MODULES is important; for many Transformer models, applying LoRA to the query and value projection matrices within the self-attention mechanism is effective. You might need to inspect the model architecture (print(model)) to identify the correct module names.

2. Loading the Foundation Model and Tokenizer

We load the pre-trained model and its corresponding tokenizer. The model will serve as the base, with its original weights frozen during adaptation.

# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(
    BASE_MODEL_NAME,
    num_labels=NUM_CLASSES
)

# Freeze all parameters of the base model
for param in model.parameters():
  param.requires_grad = False

print(f"Loaded base model: {BASE_MODEL_NAME}")

3. Preparing the Few-Shot Dataset

For few-shot learning, we need a small support set for training the adapters. We'll simulate this by sampling a small number of examples from a standard dataset. In a real-world scenario, this would be your actual limited task-specific data.

# Load the dataset
dataset = load_dataset(DATASET_NAME)

# Create a small, balanced few-shot training subset
train_dataset_full = dataset['train'].shuffle(seed=42)
sampled_train_indices = []
for label in range(NUM_CLASSES):
    label_indices = [
        i for i, ex in enumerate(train_dataset_full)
        if ex['label'] == label
    ][:FEW_SHOT_SAMPLES]
    sampled_train_indices.extend(label_indices)

few_shot_train_dataset = train_dataset_full.select(sampled_train_indices).shuffle(seed=42)

# Use a portion of the original test set for evaluation
eval_dataset = dataset['test'].shuffle(seed=42).select(range(1000)) # Use a subset for faster eval

# Preprocessing function
def preprocess_function(examples):
    return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=128)

# Apply preprocessing
encoded_train_dataset = few_shot_train_dataset.map(preprocess_function, batched=True)
encoded_eval_dataset = eval_dataset.map(preprocess_function, batched=True)

# Format datasets for PyTorch
encoded_train_dataset.set_format("torch", columns=['input_ids', 'attention_mask', 'label'])
encoded_eval_dataset.set_format("torch", columns=['input_ids', 'attention_mask', 'label'])

print(f"Prepared few-shot dataset with {len(encoded_train_dataset)} training samples.")
print(f"Using {len(encoded_eval_dataset)} samples for evaluation.")

This code snippet samples FEW_SHOT_SAMPLES examples per class from the training set and prepares both training and evaluation datasets by tokenizing the text inputs.

4. Configuring and Applying LoRA

Now, we define the LoRA configuration using LoraConfig and apply it to our frozen base model using get_peft_model. This function intelligently modifies the model architecture to include the low-rank adapters in the specified target modules.

# Define LoRA configuration
lora_config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    target_modules=LORA_TARGET_MODULES,
    lora_dropout=LORA_DROPOUT,
    bias="none", # Typically 'none', 'all', or 'lora_only'
    task_type=TaskType.SEQ_CLS # Specific task type
)

# Apply LoRA to the model
lora_model = get_peft_model(model, lora_config)

# Print trainable parameters
lora_model.print_trainable_parameters()

# Move model to the appropriate device
lora_model.to(device)

The print_trainable_parameters() method highlights the efficiency of LoRA. You'll observe that the number of trainable parameters is a very small fraction of the total parameters in the original foundation model.

A simplified view of LoRA adaptation. The original weight matrix W is frozen. Trainable low-rank matrices B and A (with rank $r \ll d, k$ ) are added in parallel. The final output combines the outputs from both branches.

5. Training the LoRA Adapters

We set up a standard PyTorch training loop. The main difference from full fine-tuning is that the optimizer only needs to manage the parameters of the LoRA adapters, which get_peft_model conveniently marks as trainable.

from torch.utils.data import DataLoader
from transformers import AdamW, get_linear_schedule_with_warmup
import numpy as np
from tqdm.notebook import tqdm # Use tqdm for progress bars

# Create DataLoaders
train_dataloader = DataLoader(encoded_train_dataset, batch_size=8, shuffle=True)
eval_dataloader = DataLoader(encoded_eval_dataset, batch_size=16)

# Optimizer - only optimizes the LoRA parameters
optimizer = AdamW(lora_model.parameters(), lr=LEARNING_RATE)

# Learning rate scheduler
num_training_steps = NUM_EPOCHS * len(train_dataloader)
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=num_training_steps
)

print("Starting LoRA adapter training...")

for epoch in range(NUM_EPOCHS):
    lora_model.train() # Set model to training mode
    total_loss = 0
    progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch+1}/{NUM_EPOCHS}", leave=False)

    for batch in progress_bar:
        # Move batch to device
        batch = {k: v.to(device) for k, v in batch.items()}

        # Forward pass
        outputs = lora_model(input_ids=batch['input_ids'],
                           attention_mask=batch['attention_mask'],
                           labels=batch['label'])

        # Calculate loss
        loss = outputs.loss
        total_loss += loss.item()

        # Backward pass
        loss.backward()

        # Optimizer step
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

        progress_bar.set_postfix({'loss': loss.item()})

    avg_train_loss = total_loss / len(train_dataloader)
    print(f"Epoch {epoch+1} Average Training Loss: {avg_train_loss:.4f}")

    # Optional: Evaluation step within the loop (see next section)
    # evaluate(lora_model, eval_dataloader, device)

print("Training finished.")

# Save the trained LoRA adapter
lora_model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR) # Save tokenizer too for easy loading
print(f"LoRA adapter saved to {OUTPUT_DIR}")

This loop iterates through the small few-shot dataset for a specified number of epochs, calculating the loss and updating only the LoRA weights (matrices A and B).

6. Evaluating the Adapted Model

After training, we evaluate the performance of the model with the trained LoRA adapters on the held-out evaluation set.

from sklearn.metrics import accuracy_score

def evaluate(model, dataloader, device):
    model.eval() # Set model to evaluation mode
    all_preds = []
    all_labels = []
    total_eval_loss = 0
    progress_bar = tqdm(dataloader, desc="Evaluating", leave=False)

    with torch.no_grad(): # Disable gradient calculations
        for batch in progress_bar:
            batch = {k: v.to(device) for k, v in batch.items()}
            outputs = model(input_ids=batch['input_ids'],
                            attention_mask=batch['attention_mask'],
                            labels=batch['label'])

            loss = outputs.loss
            total_eval_loss += loss.item()

            logits = outputs.logits
            predictions = torch.argmax(logits, dim=-1)

            all_preds.extend(predictions.cpu().numpy())
            all_labels.extend(batch['label'].cpu().numpy())

    avg_eval_loss = total_eval_loss / len(dataloader)
    accuracy = accuracy_score(all_labels, all_preds)
    print(f"Evaluation Loss: {avg_eval_loss:.4f}")
    print(f"Evaluation Accuracy: {accuracy:.4f}")
    return accuracy, avg_eval_loss

# Perform final evaluation
print("\nPerforming final evaluation...")
evaluate(lora_model, eval_dataloader, device)

This evaluation function calculates the loss and accuracy on the evaluation set, providing a measure of how well the adapter generalized from the few-shot training examples.

7. Loading and Using the Adapter

You can easily load the base model with the trained LoRA adapter for inference later:

from peft import PeftModel, PeftConfig

# Load the configuration and base model
config = PeftConfig.from_pretrained(OUTPUT_DIR)
base_model = AutoModelForSequenceClassification.from_pretrained(
    config.base_model_name_or_path, # Loads the original base model name
    num_labels=NUM_CLASSES
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)


# Load the LoRA model (merges adapter into base model)
loaded_lora_model = PeftModel.from_pretrained(base_model, OUTPUT_DIR)
loaded_lora_model.to(device)
loaded_lora_model.eval()

print("Loaded adapted model successfully.")

# Example Inference
text = "This movie was fantastic, great acting and plot!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)

with torch.no_grad():
    outputs = loaded_lora_model(**inputs)
    logits = outputs.logits
    predicted_class_id = torch.argmax(logits, dim=-1).item()

print(f"Input text: '{text}'")
print(f"Predicted class ID: {predicted_class_id}") # Map ID to label name if needed

Analysis and Considerations

This practical exercise demonstrates the core workflow of adapting a foundation model using LoRA: load, freeze, configure PEFT, train adapters on few-shot data, and evaluate.

Efficiency: You observed the significant reduction in trainable parameters compared to full fine-tuning. This translates to lower memory requirements and faster training times, especially important for very large models.
Performance: LoRA often achieves performance comparable to full fine-tuning on many tasks, despite training only a small fraction of parameters. Performance depends on the task, dataset size, base model, and hyperparameter tuning (r, alpha, target_modules).
Hyperparameter Tuning: The rank r is a critical parameter. Higher r allows capturing more complex adaptations but increases trainable parameters. lora_alpha acts as a scaling factor for the LoRA updates; often set to r or 2*r. Experimentation is needed to find optimal values.
Comparison to Meta-Learning: This LoRA adaptation process is simpler than meta-learning methods like MAML. It doesn't require a complex meta-training phase involving multiple tasks. Instead, it directly adapts the pre-trained model to the target few-shot task. While meta-learning aims to learn a good initialization or learning procedure for fast adaptation, LoRA provides a parameter-efficient mechanism for the adaptation itself. The choice between them depends on the specific problem, available data (multiple meta-training tasks vs. single few-shot task), and computational budget. Hybrid approaches combining aspects of both are also an active area of research.

This hands-on example provides a starting point. You can extend this by experimenting with different foundation models (Vision Transformers, other LLMs), exploring different PEFT techniques available in the peft library (like Prefix Tuning or Adapters), and applying it to more complex few-shot datasets and tasks.

Was this section helpful?