BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Vol. 1DOI: 10.48550/arXiv.1810.04805 - This foundational paper introduced BERT, a model that popularized the pre-train and fine-tune paradigm, demonstrating the efficiency and performance gains of transfer learning in NLP.
A Survey of Large Language Models, Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, 2023DOI: 10.48550/arXiv.2303.18223 - A recent and comprehensive survey providing a broad overview of large language models, including their pre-training, fine-tuning, and the underlying principles of transfer learning.