Deep Reinforcement Learning from Human Preferences, Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei, 2017Advances in Neural Information Processing Systems 30, Vol. 30 (Curran Associates, Inc.) - 这篇开创性论文介绍了通过人类反馈学习奖励函数的方法,特别是成对比较,这是后来应用于语言模型对齐的核心技术。