Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, Ben Mann, Jared Kaplan, 2022arXiv preprintDOI: 10.48550/arXiv.2204.05862 - 描述了一种使用人类反馈来训练大型语言模型的方法,旨在提升模型的有用性和无害性,展示了反馈循环在LLM对齐中的直接应用。