Learning to summarize with human feedback, Nikhil Stiennon, Long Ouyang, Jeff Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano, 2020Advances in Neural Information Processing Systems (NeurIPS 2020), Vol. 33 - RLHF最早的应用之一,详细描述了为摘要任务收集人类偏好数据的过程,影响了后续RLHF的数据收集方法。