Training language models to follow instructions with human feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022arXiv preprintDOI: 10.48550/arXiv.2203.02155 - 描述了 InstructGPT,这是一项基础性工作,显著提升了大型语言模型遵循指令和生成所需输出格式的能力,从而实现了本文讨论的结构化提示技术。