On the Opportunities and Risks of Foundation Models, Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, et al., 2021arXiv preprint arXiv:2108.07258 (Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI)) - Defines and discusses foundation models, which serve as the base for many specialized LLMs, and their characteristics and societal impact.
Language Models are Few-Shot Learners, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, 2020Advances in Neural Information Processing Systems (NeurIPS 2020)DOI: 10.48550/arXiv.2005.14165 - Introduces GPT-3, a prominent example of a general-purpose LLM, demonstrating its ability to perform various tasks with minimal examples.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1423 - Introduces BERT and details the pre-training and fine-tuning paradigm, which is central to creating specialized LLMs from foundational models.