Training language models to follow instructions with human feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022Advances in Neural Information Processing Systems (NeurIPS) 35DOI: 10.48550/arXiv.2203.02155 - 本文介绍了RLHF范式,并描述了最初的监督微调(SFT)阶段,该阶段利用人工编写的示例来对齐语言模型。