Training Language Models to Follow Instructions with Human Feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022arXivDOI: 10.48550/arXiv.2203.02155 - 本文介绍了基于人类反馈的强化学习(RLHF),这是使大型语言模型与人类偏好保持一致的基础方法,包括事实一致性并减少不良输出。