Learning to summarize with human feedback, Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano, 2020NeurIPS 2020DOI: 10.48550/arXiv.2009.01325 - This pioneering work introduced the core methodology of Reinforcement Learning from Human Feedback (RLHF) for training models on sequence generation tasks, specifically summarization. It details the initial approach to collecting human preference data through pairwise comparisons to train a reward model.
Inter-Annotator Agreement, Eduard Hovy, Sabine L. L. Lohmann, 2022Encyclopedia of Language and Linguistics (Springer, Cham)DOI: 10.1007/978-3-030-80275-9_38 - This reference provides a focused explanation of Inter-Annotator Agreement (IAA) methods and their importance for assessing the reliability and consistency of human annotations, a critical aspect of quality control in preference data collection. It is a chapter in the book 'Data Science and Human-in-the-Loop'.