Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020Advances in Neural Information Processing SystemsDOI: 10.48550/arXiv.2005.14165 - 本文介绍了GPT-3,并展示了大型语言模型的小样本和零样本学习能力,为上下文学习奠定了基础。
Finetuned Language Models Are Zero-Shot Learners, Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le, 2021International Conference on Learning RepresentationsDOI: 10.48550/arXiv.2109.01652 - 介绍了指令微调(FLAN),一种通过在以自然语言指令形式表达的任务集上微调语言模型的方法,以提高其零样本泛化能力。