A practical guide demonstrates how to fine-tune a pre-trained Transformer model using adapter modules. This utilizes the adapter-transformers library, an extension of Hugging Face's transformers, specifically designed to facilitate the use of adapters and other PEFT methods.Our goal is to adapt bert-base-uncased for a sentiment classification task (using the GLUE SST-2 dataset) by only training lightweight adapter modules inserted into the model. This approach keeps the overwhelming majority of the original model parameters frozen, significantly reducing computational and storage requirements compared to full fine-tuning.Setting Up the EnvironmentFirst, ensure you have the necessary libraries installed. We'll need adapter-transformers (which includes transformers and torch), and datasets for data handling.pip install -U adapter-transformers datasetsLoading the Model and DataWe start by loading the pre-trained model and tokenizer, just as you would with the standard transformers library. We'll also load the Stanford Sentiment Treebank (SST-2) dataset.from transformers import AutoTokenizer from adapter_transformers import AutoAdapterModel # Use AutoAdapterModel from datasets import load_dataset # Load tokenizer and model model_name = "bert-base-uncased" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoAdapterModel.from_pretrained(model_name) # Important change: Use AutoAdapterModel # Load dataset dataset = load_dataset("glue", "sst2") # Preprocess data def encode_batch(batch): """Tokenizes the sentences.""" return tokenizer(batch["sentence"], max_length=80, truncation=True, padding="max_length") dataset = dataset.map(encode_batch, batched=True) dataset = dataset.rename_column("label", "labels") # Rename label column for Trainer compatibility dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"]) print("Dataset sample:", dataset["train"][0])Using AutoAdapterModel instead of AutoModelForSequenceClassification is important, as it provides the necessary methods for managing adapters.Adding and Configuring AdaptersNow, we add adapter modules to the loaded BERT model. The adapter-transformers library makes this straightforward. We'll add bottleneck adapters (the classic adapter type) to each layer of the Transformer model.from adapter_transformers.training import AdapterArguments from transformers import AdapterConfig # Configure the adapter # Using Pfeiffer config: bottleneck adapter with reduction factor 16 adapter_config = AdapterConfig.load("pfeiffer", reduction_factor=16) # Add adapter to the model # Give it a unique name, e.g., "sentiment_adapter" adapter_name = "sentiment_adapter" model.add_adapter(adapter_name, config=adapter_config) # Add a classification head for our task associated with this adapter num_labels = dataset["train"].features["labels"].num_classes model.add_classification_head( adapter_name, num_labels=num_labels, id2label={ 0: "NEGATIVE", 1: "POSITIVE" } # Optional label mapping ) # Activate the adapter for training model.train_adapter(adapter_name) # Verify which parameters are trainable total_params = sum(p.numel() for p in model.parameters()) trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad) print(f"Total Parameters: {total_params}") print(f"Trainable Parameters (Adapter + Head): {trainable_params}") print(f"Trainable %: {100 * trainable_params / total_params:.4f}%")Observe the output. You'll notice that trainable_params is only a small fraction of total_params. The add_adapter method inserts the adapter modules (typically after the attention and feed-forward layers), and add_classification_head adds a new final layer for our specific task, also associated with the adapter. Crucially, train_adapter(adapter_name) freezes the entire pre-trained BERT model and unfreezes only the parameters belonging to the specified adapter (sentiment_adapter) and its associated classification head. This selective freezing/unfreezing is the core mechanism enabling parameter-efficient training.The AdapterConfig.load("pfeiffer", reduction_factor=16) specifies the architecture. "pfeiffer" refers to a standard bottleneck adapter configuration with layer normalization and specific activation functions. The reduction_factor=16 means the bottleneck dimension will be $d_{model} / 16$, where $d_{model}$ is the hidden dimension of the BERT model (768 for bert-base-uncased). Adjusting this factor directly controls the trade-off between parameter count and potential performance.Setting Up the TrainerWe use the standard Hugging Face Trainer for the training process. The setup is similar to full fine-tuning, but the Trainer will automatically handle the fact that only adapter parameters are trainable.import numpy as np from transformers import TrainingArguments, Trainer, EvalPrediction from datasets import load_metric # Define training arguments # Note: Smaller batch size & fewer epochs suitable for demonstration training_args = TrainingArguments( output_dir="./adapter_sst2_output", learning_rate=1e-4, # Adapters often benefit from slightly higher LR num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=16, logging_steps=100, evaluation_strategy="epoch", save_strategy="epoch", # Save adapter checkpoints each epoch load_best_model_at_end=True, metric_for_best_model="accuracy", remove_unused_columns=False, # Important for adapter trainer ) # Define evaluation metric metric = load_metric("glue", "sst2") def compute_metrics(p: EvalPrediction): preds = np.argmax(p.predictions, axis=1) return metric.compute(predictions=preds, references=p.label_ids) # Instantiate Trainer trainer = Trainer( model=model, args=training_args, train_dataset=dataset["train"], eval_dataset=dataset["validation"], tokenizer=tokenizer, compute_metrics=compute_metrics, )Note the learning_rate might be slightly higher (e.g., 1e-4) compared to full fine-tuning (~2e-5), as adapters sometimes converge better with a larger step size. The remove_unused_columns=False is often needed when working with adapter models within the Trainer.Training the AdapterNow, we can start the training process. During this phase, gradients will only be computed and applied to the adapter and classification head weights.# Start training train_result = trainer.train() # Log training metrics metrics = train_result.metrics trainer.log_metrics("train", metrics) trainer.save_metrics("train", metrics) # Evaluate the best model eval_metrics = trainer.evaluate(eval_dataset=dataset["validation"]) trainer.log_metrics("eval", eval_metrics) trainer.save_metrics("eval", eval_metrics)Monitor the training progress and evaluation metrics (accuracy in this case) printed during and after training. You should see the model learning the sentiment classification task effectively, despite only updating a small subset of parameters.Saving the Trained AdapterA significant advantage of Adapter Tuning is the ability to save the adapter weights independently. The base model remains unchanged.# Define path to save the adapter output_adapter_dir = "./saved_adapters/sst2_adapter" # Save the adapter weights model.save_adapter(output_adapter_dir, adapter_name) # You can also save the head if needed separately, or it's often saved with the adapter # model.save_head(output_adapter_dir, adapter_name) print(f"Adapter '{adapter_name}' saved to {output_adapter_dir}")Navigate to the output_adapter_dir. You will find configuration files (adapter_config.json) and the weight file (e.g., pytorch_adapter.bin). Notice how small these files are compared to the full model checkpoint (hundreds of megabytes for BERT-base). This demonstrates the storage efficiency of adapters.Loading and Using the Adapter for InferenceTo use the trained adapter, you load the original base model and then load the specific adapter weights.from adapter_transformers import AutoAdapterModel # Use the same class from transformers import TextClassificationPipeline # Load the base model again (imagine this is a fresh session) inference_model = AutoAdapterModel.from_pretrained(model_name) inference_tokenizer = AutoTokenizer.from_pretrained(model_name) # Load the adapter weights from the saved directory loaded_adapter_name = inference_model.load_adapter(output_adapter_dir) # Returns the name it was saved under # IMPORTANT: Set the active adapter for inference inference_model.set_active_adapters(loaded_adapter_name) # You might need to explicitly load the head if it wasn't saved with the adapter # or if you saved it separately. Often loading the adapter loads the associated head. # Check adapter-transformers documentation for specifics on head loading if issues arise. # Perform inference using a pipeline classifier = TextClassificationPipeline(model=inference_model, tokenizer=inference_tokenizer, device=training_args.device.index if training_args.device else -1) # Example sentences sentences = [ "This movie was absolutely fantastic!", "I was completely bored throughout the entire film.", "The acting was decent, but the plot was predictable." ] results = classifier(sentences) for sentence, result in zip(sentences, results): print(f"Sentence: {sentence}") print(f"Predicted Label: {result['label']}, Score: {result['score']:.4f}\n")This demonstrates the modularity: the large base model can be loaded once, and different lightweight adapters can be loaded on top to switch between tasks without needing multiple copies of the full model. set_active_adapters tells the model which adapter(s) to use for the forward pass during inference.This practical exercise illustrates the core workflow of Adapter Tuning: adding adapter modules, freezing the base model, training only the adapters, saving them separately, and loading them for efficient inference. You've successfully fine-tuned a powerful LLM for a specific task while modifying only a tiny fraction of its parameters, showcasing the efficiency and modularity benefits of this PEFT technique.