Aligning Language Models to Follow Instructions, Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Karen Simonyan, Jonathan Kaplan, Hendrik Strobelt, Raymond Burdo, Robert Long, Jeremy Nelson, Sam Stewart, Anand Sharma, Amber Kuo, Adam Gough, Ilya Sutskever, Jacob Hilton, Noah Fiedel, Harish Khandelia, Anna Chen, Christine McGrew, Lora Gordon, Michael Petrov, Hyung Won Chung, Susanne Frandsen, Andy Li, Thom Lane, Françoise Chollet, Elizabeth L. Huang, Ariel Herbert-Voss, Alexis Gray, Houman Shadab, Eric Tang, Laura Gao, Amanda Askell, Brian Chen, Anna Goldie, Azalia Mirhoseini, Chris Hallacy, Monika Zimowski, Brandon Houghton, Girish Sastry, Anna Pavlick, Geoffrey Irving, Owain Evans, Josh Achiam, John Schulman, Da Yan, Alex Passos, Charles Foster, David Amodei, 2022Advances in Neural Information Processing Systems, Vol. 35 - 详细介绍了人类反馈强化学习(RLHF)流程,包括监督微调、奖励模型训练和基于PPO的策略优化,以使语言模型与人类偏好保持一致。