Training Language Models to Follow Instructions with Human Feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022arXiv preprint arXiv:2203.02155DOI: 10.48550/arXiv.2203.02155 - 这篇基础论文介绍了用于遵循指令的语言模型的RLHF,讨论了奖励建模、策略优化以及将LLM与人类偏好对齐的实际实现中的挑战。
Overcoming catastrophic forgetting in neural networks, James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell, 2017Proceedings of the National Academy of Sciences, Vol. 114 (National Academy of Sciences)DOI: 10.1073/pnas.1611835114 - 这篇论文介绍了弹性权重巩固(EWC),一种减轻灾难性遗忘的方法,直接解决了顺序适应中的一个关键挑战。