All Courses

Introduction to LLM Fine-Tuning

Chapter 1: Foundations of Model Customization

What is Fine-Tuning?

Pre-training vs. Fine-Tuning

When to Fine-Tune: An Analytical Framework

Overview of Fine-Tuning Strategies

The Role of Transfer Learning in LLMs

Setting Up Your Development Environment

Chapter 2: Data Preparation for Fine-Tuning

Sourcing and Selecting High-Quality Datasets

Instruction-Based vs. Conversational Data Formats

Data Cleaning and Preprocessing Techniques

Creating and Structuring Custom Datasets

Tokenization for Fine-Tuning

Hands-on Practical: Building a Fine-Tuning Dataset

Chapter 3: Full Parameter Fine-Tuning

The Mechanics of Full Fine-Tuning

Architectural Considerations for Full Fine-Tuning

Managing Computational Resources

Configuring Training Arguments and Hyperparameters

Monitoring Training: Loss and Metrics

Saving and Loading Fine-Tuned Models

Practice: Full Fine-Tuning on a Small-Scale Model

Chapter 4: Parameter-Efficient Fine-Tuning (PEFT)

Introduction to Parameter-Efficient Fine-Tuning

Low-Rank Adaptation (LoRA): Theory and Operation

Implementing LoRA with the PEFT Library

Quantization and its effect on Fine-Tuning (QLoRA)

Other PEFT Methods: A Brief Survey

Comparing PEFT and Full Fine-Tuning Trade-offs

Hands-on Practical: Fine-Tuning with LoRA

Chapter 5: Evaluation and Deployment

Defining Performance Metrics for Generative Tasks

Quantitative Evaluation: ROUGE, BLEU, and Perplexity

Qualitative Evaluation: Human-in-the-Loop Assessment

Building an Evaluation Pipeline

Strategies for Merging Adapters with the Base Model

Preparing Models for Inference

Practice: Evaluating a Fine-Tuned Model

Configuring Training Arguments and Hyperparameters

Was this section helpful?

References

Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A foundational textbook covering deep learning optimization principles, including gradient descent, learning rates, batch size effects, and regularization techniques such as weight decay.
Adam: A Method for Stochastic Optimization, Diederik P. Kingma and Jimmy Ba, 2014 International Conference for Learning Representations (ICLR) DOI: 10.48550/arXiv.1412.6980 - Introduces the Adam optimizer, a widely used adaptive learning rate optimization algorithm often paired with learning rate schedules and weight decay in deep learning training.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2018 arXiv preprint arXiv:1810.04805 DOI: 10.48550/arXiv.1810.04805 - Presents the BERT model and its training methodology, which includes the application of a linear learning rate warmup followed by linear decay, a common strategy for pre-training and fine-tuning large language models.
transformers.TrainingArguments, Hugging Face, 2024 (Hugging Face) - Official documentation for the TrainingArguments class in the Hugging Face transformers library, detailing the configuration options for fine-tuning models.

© 2025 ApX Machine LearningEngineered with