RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash, 2024Proceedings of the 41st International Conference on Machine Learning, Vol. 235 (PMLR)DOI: 10.48550/arXiv.2309.00267 - 专注AI反馈强化学习(RLAIF),将其作为一种扩展基于偏好学习的方法,详细说明了AI偏好的生成及其在训练奖励模型中用于改进大语言模型校准的应用。