Building upon the concepts discussed earlier in this chapter regarding multi-task training and adapter management, this practical section provides a hands-on example of fine-tuning a single large language model using multiple LoRA adapters concurrently. This approach is particularly useful when you need to adapt a base model to several distinct tasks or datasets without duplicating the large base model weights, saving significant memory and storage resources.
We will simulate a scenario where we adapt a pre-trained model for two hypothetical tasks: Task A (e.g., text summarization) and Task B (e.g., sentiment analysis). While we won't implement the full data loading and preprocessing for specific datasets here, we'll focus on the core mechanics of configuring, adding, training, and saving multiple LoRA adapters using the Hugging Face transformers
and peft
libraries.
Ensure you have the necessary libraries installed:
pip install transformers datasets accelerate peft bitsandbytes torch
We assume you are familiar with loading models and tokenizers from Hugging Face, preparing datasets, and the basics of PyTorch training loops or the Trainer
API.
First, let's import the required modules and load our base pre-trained model. For demonstration, we'll use a smaller model, but the principles apply directly to larger LLMs. We'll also load it in 8-bit to simulate a resource-constrained environment often encountered when applying PEFT.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, DataCollatorForLanguageModeling
from peft import LoraConfig, get_peft_model, TaskType, PeftModel, set_peft_model_state_dict
from datasets import Dataset # Using dummy datasets for illustration
# Load base model and tokenizer
model_name = "gpt2" # Replace with your target LLM, e.g., "meta-llama/Llama-2-7b-hf"
# Load in 8bit for memory efficiency demonstration
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_8bit=True, # Use 8-bit loading
device_map="auto", # Automatically distribute across available GPUs/CPU
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token # Set pad token if missing
Now, we define separate LoraConfig
objects for each task (adapter). This allows us to specify different ranks, target modules, alpha values, or other hyperparameters tailored to each task if needed. We assign unique names to each adapter.
# Configuration for Task A (e.g., Summarization)
lora_config_task_a = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=8, # Rank for Task A
lora_alpha=16,
lora_dropout=0.05,
target_modules=["c_attn"], # Example: Target only attention projection layers for Task A
bias="none",
adapter_name="adapter_task_a" # Unique name for this adapter
)
# Configuration for Task B (e.g., Sentiment Analysis)
lora_config_task_b = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=4, # Lower rank for Task B
lora_alpha=8,
lora_dropout=0.1,
target_modules=["c_proj"], # Example: Target different layers for Task B
bias="none",
adapter_name="adapter_task_b" # Unique name for this adapter
)
Notice how we use different ranks (r
) and target modules for demonstration. The adapter_name
is essential for managing multiple adapters on the same base model.
We use the add_adapter
method from the peft
library to attach these configurations to our base model. The first adapter added will implicitly wrap the model using get_peft_model
. Subsequent calls to add_adapter
attach additional adapters to the same base model structure.
# Add the first adapter (Task A)
# This call internally uses get_peft_model if the model isn't already a PeftModel
model.add_adapter(lora_config_task_a, adapter_name="adapter_task_a")
print(f"Adapter 'adapter_task_a' added.")
# Add the second adapter (Task B)
model.add_adapter(lora_config_task_b, adapter_name="adapter_task_b")
print(f"Adapter 'adapter_task_b' added.")
# You can verify the adapters attached
print("Active adapters:", model.active_adapters)
print("PEFT config:", model.peft_config) # Shows configurations for all attached adapters
At this point, the model
object contains the original pre-trained weights (frozen) and the newly initialized, trainable LoRA matrices for both adapter_task_a
and adapter_task_b
.
Training with multiple adapters requires careful handling of data and the training loop. The key idea is to activate the correct adapter before processing a batch of data associated with its corresponding task.
Let's create simple dummy datasets representing our two tasks. In a real scenario, you would load and preprocess your actual task-specific datasets.
# Dummy data function
def create_dummy_dataset(text_prefix, num_samples=100):
texts = [f"{text_prefix}: Sample text number {i} for training." for i in range(num_samples)]
# Tokenize - ensure consistent processing (padding, truncation)
tokenized = tokenizer(texts, padding="max_length", truncation=True, max_length=128)
# Convert to Dataset object
return Dataset.from_dict(tokenized)
# Create datasets for each task
dataset_task_a = create_dummy_dataset("Summarize", 200)
dataset_task_b = create_dummy_dataset("Analyze Sentiment", 150)
# We need a way to identify which dataset a batch belongs to.
# One common approach is to interleave the datasets or use a custom sampler.
# For simplicity with the standard Trainer, we'll combine them and add a task identifier,
# although a custom training loop offers more control.
# Add adapter identifiers (simplistic approach for demonstration)
dataset_task_a = dataset_task_a.map(lambda example: {'adapter_name': "adapter_task_a"})
dataset_task_b = dataset_task_b.map(lambda example: {'adapter_name': "adapter_task_b"})
# Combine datasets (naive interleaving) - Requires careful shuffling/sampling in practice
from datasets import concatenate_datasets
combined_dataset = concatenate_datasets([dataset_task_a, dataset_task_b]).shuffle(seed=42)
# We need a custom data collator or modification to handle the 'adapter_name' column
# Or, more commonly, use a custom training loop.
# For this example, let's stick to the Trainer but acknowledge its limitations here.
# The standard DataCollatorForLanguageModeling won't use 'adapter_name'.
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
print(f"Combined dataset size: {len(combined_dataset)}")
print("Example entry:", combined_dataset[0])
The standard Hugging Face Trainer
doesn't natively support dynamically switching adapters based on batch metadata. You typically need either:
model.set_adapter(adapter_name)
before the forward pass, compute loss, and perform backpropagation. Only the active adapter's weights will receive gradients.Trainer
and override methods like compute_loss
or training_step
to include the model.set_adapter()
call based on information you inject into the batches (like the adapter_name
column we added, though passing it through the default collator needs care).Let's outline the logic within a conceptual custom training loop (actual implementation requires PyTorch boilerplate):
# --- Conceptual Custom Training Loop Snippet ---
# Assume 'dataloader' yields batches, each containing data and 'adapter_name'
# optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5) # Optimizer targets PEFT params
# model.train()
# for batch in dataloader:
# adapter_name_for_batch = batch.pop("adapter_name") # Extract adapter name
# inputs = {k: v.to(model.device) for k, v in batch.items()} # Move data to device
# # *** Key Step: Set the active adapter ***
# model.set_adapter(adapter_name_for_batch)
# # Forward pass - Only active adapter is used
# outputs = model(**inputs)
# loss = outputs.loss
# # Backward pass - Gradients flow only to active LoRA weights
# loss.backward()
# optimizer.step()
# optimizer.zero_grad()
# # Optionally, disable the adapter after the step if needed,
# # though setting it at the start of the next iteration is sufficient.
# # model.disable_adapter()
# --- End Conceptual Snippet ---
Important Consideration: When training multiple adapters, ensure balanced sampling from each task's dataset to prevent one adapter from dominating the training process or overfitting/underfitting relative to the others. This might involve weighted sampling or carefully structured epochs.
For this example, we'll proceed with the Trainer
but train the adapters sequentially for simplicity, which isn't true simultaneous multi-adapter training but demonstrates adapter switching and saving. This avoids the complexity of modifying the Trainer
or writing a full custom loop for this specific hands-on.
# Define base training arguments
training_args = TrainingArguments(
output_dir="./multi_adapter_output",
num_train_epochs=1, # Short training for demonstration
per_device_train_batch_size=4,
logging_steps=50,
save_strategy="epoch", # Save at the end of each epoch
learning_rate=3e-4,
weight_decay=0.01,
report_to="none", # Disable external logging for simplicity
remove_unused_columns=False # Keep 'adapter_name' if using modified Trainer later
)
# --- Training Phase 1: Train only adapter_task_a ---
print("\n--- Training Adapter A ---")
model.set_adapter("adapter_task_a") # Activate Task A adapter
# Ensure only LoRA parameters are trainable for this adapter
for name, param in model.named_parameters():
if 'lora' in name:
param.requires_grad = "adapter_task_a" in name
else:
param.requires_grad = False
# Re-initialize trainer for Task A dataset
trainer_a = Trainer(
model=model,
args=training_args,
train_dataset=dataset_task_a, # Use Task A data
data_collator=data_collator,
# Note: We removed the 'adapter_name' column for this simple trainer setup
# as the standard collator would complain.
)
trainer_a.train()
print("Finished training Adapter A.")
# Save Task A adapter specifically
model.save_pretrained("./multi_adapter_output/adapter_task_a", selected_adapters=["adapter_task_a"])
print("Adapter A saved to ./multi_adapter_output/adapter_task_a")
# --- Training Phase 2: Train only adapter_task_b ---
print("\n--- Training Adapter B ---")
model.set_adapter("adapter_task_b") # Activate Task B adapter
# Ensure only LoRA parameters are trainable for this adapter
for name, param in model.named_parameters():
if 'lora' in name:
param.requires_grad = "adapter_task_b" in name
else:
param.requires_grad = False
# We might need to reset optimizer state or use a new trainer instance
# For simplicity, let's create a new trainer for Task B
training_args.output_dir = "./multi_adapter_output_b" # Use different output dir if needed
trainer_b = Trainer(
model=model,
args=training_args,
train_dataset=dataset_task_b, # Use Task B data
data_collator=data_collator,
)
trainer_b.train()
print("Finished training Adapter B.")
# Save Task B adapter specifically
model.save_pretrained("./multi_adapter_output/adapter_task_b", selected_adapters=["adapter_task_b"])
print("Adapter B saved to ./multi_adapter_output/adapter_task_b")
Note: Re-running Trainer.train()
multiple times like this might not be ideal, especially regarding optimizer state and learning rate scheduling. A custom loop or modified Trainer
provides better control for true interleaved multi-task training.
As shown above, you can save specific adapters using the selected_adapters
argument in model.save_pretrained()
. Each adapter is saved in its own subdirectory containing the LoRA weights (adapter_model.bin
) and configuration (adapter_config.json
).
To load these adapters later for inference:
from peft import PeftModel
# Load the base model again (if not already in memory)
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_8bit=True,
device_map="auto",
)
# Load the first adapter
model_with_adapter_a = PeftModel.from_pretrained(
base_model,
"./multi_adapter_output/adapter_task_a", # Path to saved adapter A
adapter_name="adapter_task_a" # Use the same name
)
print("Adapter A loaded.")
# Load the second adapter onto the SAME base model instance
# Important: Load subsequent adapters onto the PeftModel object
model_with_adapter_a.load_adapter(
"./multi_adapter_output/adapter_task_b", # Path to saved adapter B
adapter_name="adapter_task_b" # Use the same name
)
print("Adapter B loaded onto the same model.")
# Verify loaded adapters
print("Loaded adapters:", model_with_adapter_a.peft_config.keys())
During inference, you can dynamically switch between the loaded adapters using set_adapter
.
# Generate text using Adapter A
model_with_adapter_a.set_adapter("adapter_task_a")
print("\n--- Generating with Adapter A ---")
prompt_a = "Summarize this document: ..." # Example prompt for Task A
inputs_a = tokenizer(prompt_a, return_tensors="pt").to(model_with_adapter_a.device)
# Ensure model is in eval mode for generation
model_with_adapter_a.eval()
with torch.no_grad():
outputs_a = model_with_adapter_a.generate(**inputs_a, max_new_tokens=50)
print("Adapter A Output:", tokenizer.decode(outputs_a[0], skip_special_tokens=True))
# Generate text using Adapter B
model_with_adapter_a.set_adapter("adapter_task_b")
print("\n--- Generating with Adapter B ---")
prompt_b = "Analyze the sentiment: This movie was fantastic!" # Example prompt for Task B
inputs_b = tokenizer(prompt_b, return_tensors="pt").to(model_with_adapter_a.device)
model_with_adapter_a.eval()
with torch.no_grad():
outputs_b = model_with_adapter_a.generate(**inputs_b, max_new_tokens=20)
print("Adapter B Output:", tokenizer.decode(outputs_b[0], skip_special_tokens=True))
# Optionally disable adapters to use the base model
# model_with_adapter_a.disable_adapter()
Trainer
. For genuine concurrent training where adapter weights are updated in an interleaved manner within the same training run, implementing a custom PyTorch loop or modifying the Trainer
to handle adapter switching per batch is necessary.This practical exercise demonstrated the mechanics of managing multiple LoRA adapters. Adapting this framework to real-world datasets and implementing a robust interleaved training strategy are the next steps for applying this powerful technique effectively.
© 2025 ApX Machine Learning