Training language models to follow instructions with human feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2203.02155 - 介绍了InstructGPT模型,该模型利用人工反馈进行微调,以遵循指令并生成有用、无害且诚实的回复,展示了人工评估在大型语言模型对齐中的有效性。