Building upon the methods for adapting models using full parameter updates and efficient techniques, this chapter focuses on more complex fine-tuning situations. You will learn strategies for training a single model on several objectives (multi-task fine-tuning) and for updating models sequentially while managing knowledge retention (sequential adaptation).
We will cover methods designed to counter catastrophic forgetting, helping models maintain performance on previously learned tasks when new data or objectives are introduced. Furthermore, this section introduces Reinforcement Learning from Human Feedback (RLHF). We will explain its key components: training reward models, rϕ(x,y), to capture preferences, and using policy optimization algorithms like Proximal Policy Optimization (PPO) to refine the language model, πθ(y∣x), based on those preferences. These techniques represent advanced approaches for tailoring LLM behavior to specific requirements.
5.1 Multi-Task Fine-tuning
5.2 Sequential Adaptation and Continual Learning
5.3 Mitigating Catastrophic Forgetting
5.4 Introduction to Reinforcement Learning from Human Feedback (RLHF)
5.5 Reward Model Training
5.6 Policy Optimization with PPO
5.7 Challenges in Advanced Adaptation
© 2025 ApX Machine Learning