Training language models to follow instructions with human feedback, Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Gray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022Advances in Neural Information Processing Systems, Vol. 35 (Neural Information Processing Systems) - 介绍了通过人类反馈强化学习(RLHF)使语言模型与用户偏好对齐的方法,这是生成器适应的核心机制。
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Akari Asai, Sandeep Subramanian, Connor Rawls, Vitor Carvalho, Yizhong Wang, Patrick Lewis, Akiko Eriguchi, Fatemeh Mirshafiee, Matthew E. Peters, Andrew Head, Nikolaus Parulian, Bradford Ong, Zeqiu Wu, Daniel Khashabi, Hannaneh Hajishirzi, 2023arXiv preprint arXiv:2310.11511DOI: 10.48550/arXiv.2310.11511 - 详细介绍了通过LLM自我批评并检索额外文档进行自我纠正,以提升生成质量的RAG框架,与系统内部反馈理念一致。