Full parameter fine-tuning is performed on a small-scale generative model. A pre-trained model is adapted to a specific question-answering style. The entire workflow, from loading the data to generating text with the newly specialized model, is covered.
We will use the Qwen/Qwen2.5-0.5B model, a smaller and more manageable version of Qwen, which makes it ideal for running this example on a single GPU, like those available in Google Colab or Kaggle Kernels. The dataset will be a small subset of squad, a dataset containing questions and answers based on reading passages.
Before we write the code, let's outline the steps we will take. The entire process follows a structured pipeline, which is a common pattern in machine learning projects.
The process begins with data and model preparation, moves to the training cycle where the model's weights are updated, and concludes with saving the model for future use.
First, ensure you have the necessary libraries installed. The transformers library provides the models and the Trainer API, datasets helps manage our data, and accelerate optimizes the training code for PyTorch.
pip install datasets tokenizers huggingface-hub transformers accelerate evaluate torch vllm
We will also log in to the Hugging Face Hub if we want to save our model checkpoints online. This step is optional but good practice.
from huggingface_hub import notebook_login
notebook_login()
We will use a small portion of the squad dataset to keep the training time short. Our goal is to format the question-and-answer pairs into a single string that the model will learn to generate.
Let's load the dataset and create a simple formatting function. We'll format each example as question: [QUESTION] answer: [ANSWER]. This structure teaches the model to produce an answer when it sees the "question:" prefix.
from datasets import load_dataset
# Load a small part of the training set
train_dataset = load_dataset("squad", split="train[:5000]")
# Split the 5000 examples into training and validation sets
train_test_split = train_dataset.train_test_split(test_size=0.1)
train_dataset = train_test_split["train"]
eval_dataset = train_test_split["test"]
# Define the formatting function
def format_qa(example):
# SQuAD dataset has answers as a list, take the first one
question = example["question"]
answer = example["answers"]["text"][0]
return f"question: {question} answer: {answer}"
Data Formatting is Task Definition The way you structure your data is fundamental. By formatting our data as
question: ... answer: ..., we are implicitly teaching the model a specific task: given a string that starts withquestion:, complete it with a relevantanswer:.
Now, we need to load the tokenizer and apply it to our formatted dataset. The tokenizer converts our text strings into the integer IDs that the model understands. We will also set a pad_token to handle inputs of varying lengths.
from transformers import AutoTokenizer
# Load the tokenizer for Qwen/Qwen2.5-0.5B
model_name = "Qwen/Qwen2.5-0.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token # Set padding token
def tokenize_function(examples):
formatted_examples = [
format_qa({"question": q, "answers": a})
for q, a in zip(examples["question"], examples["answers"])
]
tokenized = tokenizer(formatted_examples, truncation=True, padding="max_length", max_length=128)
tokenized["labels"] = tokenized["input_ids"].copy() # Add labels for training
return tokenized
# Apply tokenization to the datasets
tokenized_train_dataset = train_dataset.map(tokenize_function, batched=True, remove_columns=train_dataset.column_names)
tokenized_eval_dataset = eval_dataset.map(tokenize_function, batched=True, remove_columns=eval_dataset.column_names)
With our data ready, we can load the Qwen/Qwen2.5-0.5B model. We use AutoModelForCausalLM because our task is text generation, also known as causal language modeling.
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer
# Load the pre-trained model
model = AutoModelForCausalLM.from_pretrained(model_name)
Next, we define the TrainingArguments. This object contains all the hyperparameters for the training run, such as the learning rate, number of epochs, and batch size. These settings directly control the gradient descent update process discussed at the start of the chapter.
training_args = TrainingArguments(
output_dir="qwen2.5-0.5b-squad-finetuned",
num_train_epochs=3,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=2e-5,
weight_decay=0.01,
eval_strategy="epoch",
save_strategy="epoch",
logging_steps=100,
load_best_model_at_end=True,
push_to_hub=False, # Set to True if you are logged in and want to push
)
We are now ready to bring everything together in the Trainer object. It requires the model, training arguments, datasets, and tokenizer.
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train_dataset,
eval_dataset=tokenized_eval_dataset,
tokenizer=tokenizer,
)
# Start fine-tuning
trainer.train()
When you run trainer.train(), you will see a progress bar that reports the training loss and other metrics. This output is your primary tool for monitoring the process, as covered in the "Monitoring Training" section. A steadily decreasing loss indicates that the model is learning from your data.
/opt/conda/lib/python3.10/site-packages/transformers/trainer.py:...
***** Running training *****
Num examples = 4500
Num Epochs = 3
Instantaneous batch size per device = 8
...
Step | Training Loss | Validation Loss
100 | 1.503200 | N/A
200 | 1.354100 | N/A
...
563 | 1.298700 | 1.251142
...
After training completes, the Trainer will have automatically evaluated the model on the validation set at the end of each epoch. You can also trigger a final evaluation manually.
import math
eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")
Perplexity is a common metric for language models; it measures how well the model predicts a sample of text. A lower perplexity score indicates better performance.
The Trainer saves the best model checkpoint in the output_dir specified in TrainingArguments. You can also save it manually to a different location.
# Save the model and tokenizer
trainer.save_model("my_finetuned_qwen2.5-0.5b")
tokenizer.save_pretrained("my_finetuned_qwen2.5-0.5b")
The final step is to use your model for inference. Let's see if it has learned to answer questions in the desired format. We can use the pipeline utility for a straightforward test.
from transformers import pipeline
# Load the fine-tuned model for inference
finetuned_model = AutoModelForCausalLM.from_pretrained("my_finetuned_qwen2.5-0.5b")
finetuned_tokenizer = AutoTokenizer.from_pretrained("my_finetuned_qwen2.5-0.5b")
# Create a text generation pipeline
generator = pipeline("text-generation", model=finetuned_model, tokenizer=finetuned_tokenizer)
# A new question in the format our model expects
prompt = "question: What is the main purpose of the immune system?"
# Generate an answer
result = generator(prompt, max_length=100, num_return_sequences=1)
print(result[0]['generated_text'])
You should see an output that starts with your prompt and continues with a generated answer, following the style of the squad dataset. The model has successfully adapted its behavior from a general-purpose text generator to a more specialized question-answerer.
This hands-on exercise demonstrates the end-to-end process of full parameter fine-tuning. While effective, updating every single weight is computationally expensive. In the next chapter, we will look at more efficient techniques that can achieve comparable results with a fraction of the computational cost.
Cleaner syntax. Built-in debugging. Production-ready from day one.
Built for the AI systems behind ApX Machine Learning
Was this section helpful?
© 2026 ApX Machine LearningEngineered with