A Survey of Large Language Models, Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, 2023arXiv preprint arXiv:2303.18223DOI: 10.48550/arXiv.2303.18223 - Provides a comprehensive review of large language models, including their development, architectures, training methodologies (pre-training on vast data), and adaptation techniques, directly supporting the understanding of foundational models.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2018Proceedings of NAACL-HLT 2019DOI: 10.48550/arXiv.1810.04805 - This paper introduced the BERT model, which established the highly influential paradigm of pre-training a large general-purpose language model on extensive data and then fine-tuning it for various downstream tasks, a core concept for foundational models.
CS224N: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2023 (Stanford University) - This university course provides in-depth lectures and materials on the principles of natural language processing with deep learning, covering topics such as large language model architectures, pre-training, and fine-tuning, which are fundamental to understanding foundational models.