Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.1706.03762 - 描述了Transformer架构,该架构是现代大型语言模型的基础,解释了其模式匹配能力背后的机制。
Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.2005.14165 - 介绍了GPT-3,展示了参数和训练数据规模的增加如何使大型语言模型能够以最少的特定任务数据执行各种任务。
On the Opportunities and Risks of Foundation Models, Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Dilara Bakar, Percy Liang, et al., 2021arXiv (Stanford Institute for Human-Centered Artificial Intelligence (HAI))DOI: 10.48550/arXiv.2108.07258 - 介绍了基础模型的概念,其中大型语言模型是突出类型,讨论了它们在各种应用中的共享能力和影响。