Learning to summarize with human feedback, Nikhil Stiennon, Long Ouyang, Jeff Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano, 2020Advances in Neural Information Processing Systems (NeurIPS 2020), Vol. 33 - Presents one of the earliest applications of RLHF, providing a detailed account of collecting human preference data for summarization tasks, influencing subsequent RLHF data collection methods.
Helpful, Harmless, and Honest: Developing Safe and Reliable AI Assistants, Jared Kaplan, Sam Bowman, et al. (Anthropic Team), 2022 (Anthropic) - Offers practical insights into the iterative process of gathering human feedback and refining guidelines to achieve helpful, harmless, and honest AI assistant behavior.