LIMA: Less Is More for Alignment, Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy, 2023DOI: 10.48550/arXiv.2305.11206 - Demonstrates that high-quality, carefully curated (even if synthetic) instruction-response pairs are more effective for fine-tuning than large volumes of lower-quality data.
Training language models to follow instructions with human feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2203.02155 - Details the InstructGPT approach, highlighting the iterative refinement of model behavior through human feedback, which implicitly defines the attributes of desired synthetic instruction-following data for alignment.