Training language models to follow instructions with human feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2203.02155 - Introduces InstructGPT, a model fine-tuned using human feedback to follow instructions and generate helpful, harmless, and honest responses, demonstrating the effectiveness of human evaluation in LLM alignment.
Cheap and Fast-But is it Good?: Evaluating Non-Expert Annotations for Natural Language Tasks, Rion Snow, Brendan O’Connor, Daniel Jurafsky, Andrew Ng, 2008Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics)DOI: 10.3115/1613715.1613751 - A foundational paper discussing methods for effective crowdsourcing of linguistic annotations, focusing on quality control, rater agreement, and aggregation techniques for non-expert raters.