Training Language Models to Follow Instructions with Human Feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022arXiv preprint arXiv:2203.02155DOI: 10.48550/arXiv.2203.02155 - This foundational paper introduces Reinforcement Learning from Human Feedback (RLHF) for instruction-following language models, discussing challenges in reward modeling, policy optimization, and the practical implementation of aligning LLMs with human preferences.
Overcoming catastrophic forgetting in neural networks, James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell, 2017Proceedings of the National Academy of Sciences, Vol. 114 (National Academy of Sciences)DOI: 10.1073/pnas.1611835114 - This paper introduces Elastic Weight Consolidation (EWC), a method for mitigating catastrophic forgetting, directly addressing a key challenge in sequential adaptation.