Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, 2022Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2203.02155 - 这篇开创性论文介绍了InstructGPT模型,详细阐述了监督微调(SFT)阶段作为使大型语言模型与人类偏好对齐的初始步骤。
Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020arXivDOI: 10.48550/arXiv.2005.14165 - 这篇基础性论文介绍了GPT-3,详细阐述了大型语言模型用于下一词元预测的预训练方法,这为后续的微调技术奠定了基础。
Speech and Language Processing, Daniel Jurafsky, James H. Martin, 2025 - 这本全面的教科书提供了自然语言处理的基础知识,包括与语言模型微调相关的监督学习技术的解释。指第4版公开草稿。