Universal Language Model Fine-tuning for Text Classification, Jeremy Howard and Sebastian Ruder, 2018Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/P18-1031 - This paper introduced an effective method for fine-tuning pre-trained language models for downstream NLP tasks, demonstrating the efficiency and performance gains of transfer learning.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1423 - A landmark paper that presented a powerful pre-training technique for language models and a successful fine-tuning strategy, becoming a standard for many subsequent LLMs.
Transfer Learning in Natural Language Processing, Sebastian Ruder, Matthew E. Peters, Swabha Swayamdipta, Thomas Wolf, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials (Association for Computational Linguistics)DOI: 10.18653/v1/N19-5004 - This tutorial provides a comprehensive overview of transfer learning techniques applied to natural language processing, serving as an excellent resource for understanding its principles and applications in LLMs.
Exploring the Limits of Language Modeling, Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu, 2016arXiv preprint arXiv:1602.02410DOI: 10.48550/arXiv.1602.02410 - This paper demonstrates the effectiveness of large-scale language model pre-training on vast datasets, showcasing the potential for models to learn general language understanding before specialized tasks.