Fine-tuning a single large language model using multiple LoRA adapters concurrently allows for adapting a base model to several distinct tasks or datasets without duplicating the large base model weights. This method offers significant memory and storage savings, providing an efficient strategy for multi-task training and adapter management.We will simulate a scenario where we adapt a pre-trained model for two tasks: Task A (e.g., text summarization) and Task B (e.g., sentiment analysis). While we won't implement the full data loading and preprocessing for specific datasets here, we'll focus on the core mechanics of configuring, adding, training, and saving multiple LoRA adapters using the Hugging Face transformers and peft libraries.PrerequisitesEnsure you have the necessary libraries installed:pip install transformers datasets accelerate peft bitsandbytes torchWe assume you are familiar with loading models and tokenizers from Hugging Face, preparing datasets, and the basics of PyTorch training loops or the Trainer API.1. Setup and Model LoadingFirst, let's import the required modules and load our base pre-trained model. For demonstration, we'll use a smaller model, but the principles apply directly to larger LLMs. We'll also load it in 8-bit to simulate a resource-constrained environment often encountered when applying PEFT.import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, DataCollatorForLanguageModeling from peft import LoraConfig, get_peft_model, TaskType, PeftModel, set_peft_model_state_dict from datasets import Dataset # Using dummy datasets for illustration # Load base model and tokenizer model_name = "gpt2" # Replace with your target LLM, e.g., "meta-llama/Llama-2-7b-hf" # Load in 8bit for memory efficiency demonstration model = AutoModelForCausalLM.from_pretrained( model_name, load_in_8bit=True, # Use 8-bit loading device_map="auto", # Automatically distribute across available GPUs/CPU ) tokenizer = AutoTokenizer.from_pretrained(model_name) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token # Set pad token if missing2. Defining Multiple LoRA ConfigurationsNow, we define separate LoraConfig objects for each task (adapter). This allows us to specify different ranks, target modules, alpha values, or other hyperparameters tailored to each task if needed. We assign unique names to each adapter.# Configuration for Task A (e.g., Summarization) lora_config_task_a = LoraConfig( task_type=TaskType.CAUSAL_LM, r=8, # Rank for Task A lora_alpha=16, lora_dropout=0.05, target_modules=["c_attn"], # Example: Target only attention projection layers for Task A bias="none", adapter_name="adapter_task_a" # Unique name for this adapter ) # Configuration for Task B (e.g., Sentiment Analysis) lora_config_task_b = LoraConfig( task_type=TaskType.CAUSAL_LM, r=4, # Lower rank for Task B lora_alpha=8, lora_dropout=0.1, target_modules=["c_proj"], # Example: Target different layers for Task B bias="none", adapter_name="adapter_task_b" # Unique name for this adapter )Notice how we use different ranks (r) and target modules for demonstration. The adapter_name is essential for managing multiple adapters on the same base model.3. Adding Adapters to the ModelWe use the add_adapter method from the peft library to attach these configurations to our base model. The first adapter added will implicitly wrap the model using get_peft_model. Subsequent calls to add_adapter attach additional adapters to the same base model structure.# Add the first adapter (Task A) # This call internally uses get_peft_model if the model isn't already a PeftModel model.add_adapter(lora_config_task_a, adapter_name="adapter_task_a") print(f"Adapter 'adapter_task_a' added.") # Add the second adapter (Task B) model.add_adapter(lora_config_task_b, adapter_name="adapter_task_b") print(f"Adapter 'adapter_task_b' added.") # You can verify the adapters attached print("Active adapters:", model.active_adapters) print("PEFT config:", model.peft_config) # Shows configurations for all attached adaptersAt this point, the model object contains the original pre-trained weights (frozen) and the newly initialized, trainable LoRA matrices for both adapter_task_a and adapter_task_b.4. Preparing Data and TrainingTraining with multiple adapters requires careful handling of data and the training loop. The main idea is to activate the correct adapter before processing a batch of data associated with its corresponding task.4.1 Dummy Data Preparation (Illustrative)Let's create simple dummy datasets representing our two tasks. In a real scenario, you would load and preprocess your actual task-specific datasets.# Dummy data function def create_dummy_dataset(text_prefix, num_samples=100): texts = [f"{text_prefix}: Sample text number {i} for training." for i in range(num_samples)] # Tokenize - ensure consistent processing (padding, truncation) tokenized = tokenizer(texts, padding="max_length", truncation=True, max_length=128) # Convert to Dataset object return Dataset.from_dict(tokenized) # Create datasets for each task dataset_task_a = create_dummy_dataset("Summarize", 200) dataset_task_b = create_dummy_dataset("Analyze Sentiment", 150) # We need a way to identify which dataset a batch belongs to. # One common approach is to interleave the datasets or use a custom sampler. # For simplicity with the standard Trainer, we'll combine them and add a task identifier, # although a custom training loop offers more control. # Add adapter identifiers (simplistic approach for demonstration) dataset_task_a = dataset_task_a.map(lambda example: {'adapter_name': "adapter_task_a"}) dataset_task_b = dataset_task_b.map(lambda example: {'adapter_name': "adapter_task_b"}) # Combine datasets (naive interleaving) - Requires careful shuffling/sampling in practice from datasets import concatenate_datasets combined_dataset = concatenate_datasets([dataset_task_a, dataset_task_b]).shuffle(seed=42) # We need a custom data collator or modification to handle the 'adapter_name' column # Or, more commonly, use a custom training loop. # For this example, let's stick to the Trainer but acknowledge its limitations here. # The standard DataCollatorForLanguageModeling won't use 'adapter_name'. data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False) print(f"Combined dataset size: {len(combined_dataset)}") print("Example entry:", combined_dataset[0])4.2 Custom Trainer or Training Loop LogicThe standard Hugging Face Trainer doesn't natively support dynamically switching adapters based on batch metadata. You typically need either:A Custom Training Loop: Iterate through batches, identify the task/adapter for the batch, call model.set_adapter(adapter_name) before the forward pass, compute loss, and perform backpropagation. Only the active adapter's weights will receive gradients.Modify the Trainer: Subclass Trainer and override methods like compute_loss or training_step to include the model.set_adapter() call based on information you inject into the batches (like the adapter_name column we added, though passing it through the default collator needs care).Let's outline the logic within a custom training loop (actual implementation requires PyTorch boilerplate):# --- Custom Training Loop Snippet --- # Assume 'dataloader' yields batches, each containing data and 'adapter_name' # optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5) # Optimizer targets PEFT params # model.train() # for batch in dataloader: # adapter_name_for_batch = batch.pop("adapter_name") # Extract adapter name # inputs = {k: v.to(model.device) for k, v in batch.items()} # Move data to device # *** Critical Step: Set the active adapter *** # model.set_adapter(adapter_name_for_batch) # # Forward pass - Only active adapter is used # outputs = model(**inputs) # loss = outputs.loss # # Backward pass - Gradients flow only to active LoRA weights # loss.backward() # optimizer.step() # optimizer.zero_grad() # # Optionally, disable the adapter after the step if needed, # # though setting it at the start of the next iteration is sufficient. # # model.disable_adapter() # --- End Snippet ---Important Consideration: When training multiple adapters, ensure balanced sampling from each task's dataset to prevent one adapter from dominating the training process or overfitting/underfitting relative to the others. This might involve weighted sampling or carefully structured epochs.4.3 Training with a Simplified Trainer (Demonstration Only)For this example, we'll proceed with the Trainer but train the adapters sequentially for simplicity, which isn't true simultaneous multi-adapter training but demonstrates adapter switching and saving. This avoids the complexity of modifying the Trainer or writing a full custom loop for this specific hands-on.# Define base training arguments training_args = TrainingArguments( output_dir="./multi_adapter_output", num_train_epochs=1, # Short training for demonstration per_device_train_batch_size=4, logging_steps=50, save_strategy="epoch", # Save at the end of each epoch learning_rate=3e-4, weight_decay=0.01, report_to="none", # Disable external logging for simplicity remove_unused_columns=False # Keep 'adapter_name' if using modified Trainer later ) # --- Training Phase 1: Train only adapter_task_a --- print("\n--- Training Adapter A ---") model.set_adapter("adapter_task_a") # Activate Task A adapter # Ensure only LoRA parameters are trainable for this adapter for name, param in model.named_parameters(): if 'lora' in name: param.requires_grad = "adapter_task_a" in name else: param.requires_grad = False # Re-initialize trainer for Task A dataset trainer_a = Trainer( model=model, args=training_args, train_dataset=dataset_task_a, # Use Task A data data_collator=data_collator, # Note: We removed the 'adapter_name' column for this simple trainer setup # as the standard collator would complain. ) trainer_a.train() print("Finished training Adapter A.") # Save Task A adapter specifically model.save_pretrained("./multi_adapter_output/adapter_task_a", selected_adapters=["adapter_task_a"]) print("Adapter A saved to ./multi_adapter_output/adapter_task_a") # --- Training Phase 2: Train only adapter_task_b --- print("\n--- Training Adapter B ---") model.set_adapter("adapter_task_b") # Activate Task B adapter # Ensure only LoRA parameters are trainable for this adapter for name, param in model.named_parameters(): if 'lora' in name: param.requires_grad = "adapter_task_b" in name else: param.requires_grad = False # We might need to reset optimizer state or use a new trainer instance # For simplicity, let's create a new trainer for Task B training_args.output_dir = "./multi_adapter_output_b" # Use different output dir if needed trainer_b = Trainer( model=model, args=training_args, train_dataset=dataset_task_b, # Use Task B data data_collator=data_collator, ) trainer_b.train() print("Finished training Adapter B.") # Save Task B adapter specifically model.save_pretrained("./multi_adapter_output/adapter_task_b", selected_adapters=["adapter_task_b"]) print("Adapter B saved to ./multi_adapter_output/adapter_task_b")Note: Re-running Trainer.train() multiple times like this might not be ideal, especially regarding optimizer state and learning rate scheduling. A custom loop or modified Trainer provides better control for true interleaved multi-task training.5. Saving and Loading Multiple AdaptersAs shown above, you can save specific adapters using the selected_adapters argument in model.save_pretrained(). Each adapter is saved in its own subdirectory containing the LoRA weights (adapter_model.bin) and configuration (adapter_config.json).To load these adapters later for inference:from peft import PeftModel # Load the base model again (if not already in memory) base_model = AutoModelForCausalLM.from_pretrained( model_name, load_in_8bit=True, device_map="auto", ) # Load the first adapter model_with_adapter_a = PeftModel.from_pretrained( base_model, "./multi_adapter_output/adapter_task_a", # Path to saved adapter A adapter_name="adapter_task_a" # Use the same name ) print("Adapter A loaded.") # Load the second adapter onto the SAME base model instance # Important: Load subsequent adapters onto the PeftModel object model_with_adapter_a.load_adapter( "./multi_adapter_output/adapter_task_b", # Path to saved adapter B adapter_name="adapter_task_b" # Use the same name ) print("Adapter B loaded onto the same model.") # Verify loaded adapters print("Loaded adapters:", model_with_adapter_a.peft_config.keys())6. Inference with Specific AdaptersDuring inference, you can dynamically switch between the loaded adapters using set_adapter.# Generate text using Adapter A model_with_adapter_a.set_adapter("adapter_task_a") print("\n--- Generating with Adapter A ---") prompt_a = "Summarize this document: ..." # Example prompt for Task A inputs_a = tokenizer(prompt_a, return_tensors="pt").to(model_with_adapter_a.device) # Ensure model is in eval mode for generation model_with_adapter_a.eval() with torch.no_grad(): outputs_a = model_with_adapter_a.generate(**inputs_a, max_new_tokens=50) print("Adapter A Output:", tokenizer.decode(outputs_a[0], skip_special_tokens=True)) # Generate text using Adapter B model_with_adapter_a.set_adapter("adapter_task_b") print("\n--- Generating with Adapter B ---") prompt_b = "Analyze the sentiment: This movie was fantastic!" # Example prompt for Task B inputs_b = tokenizer(prompt_b, return_tensors="pt").to(model_with_adapter_a.device) model_with_adapter_a.eval() with torch.no_grad(): outputs_b = model_with_adapter_a.generate(**inputs_b, max_new_tokens=20) print("Adapter B Output:", tokenizer.decode(outputs_b[0], skip_special_tokens=True)) # Optionally disable adapters to use the base model # model_with_adapter_a.disable_adapter()DiscussionTrue Simultaneous Training: This hands-on used sequential training for simplicity with the standard Trainer. For genuine concurrent training where adapter weights are updated in an interleaved manner within the same training run, implementing a custom PyTorch loop or modifying the Trainer to handle adapter switching per batch is necessary.Resource Efficiency: The primary benefit here is memory. You hold only one copy of the large base model weights in memory, plus the small LoRA adapter weights for each task. This contrasts sharply with maintaining multiple fully fine-tuned model copies.Adapter Interference: Training multiple adapters simultaneously on the same base model might lead to subtle interactions or interference, especially if target modules overlap significantly or tasks are conflicting. Careful monitoring and hyperparameter tuning per adapter might be required.Task Scheduling/Sampling: In a custom loop, how you sample batches from different task datasets (e.g., round-robin, weighted sampling based on dataset size or task importance) becomes an important design choice influencing convergence and final performance balance.Deployment: The ability to load multiple adapters onto a single base model instance is highly efficient for deployment. A single deployed model can serve requests for multiple tasks simply by activating the appropriate adapter before processing the request."This practical exercise demonstrated the mechanics of managing multiple LoRA adapters. Adapting this framework to datasets and implementing an interleaved training strategy are the next steps for applying this technique effectively."