Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture, which is a basis for modern large language models, including those that are fine-tuned.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)DOI: 10.48550/arXiv.1810.04805 - Details the pre-training and fine-tuning paradigm for deep bidirectional Transformers, establishing a key method for adapting large language models.
Natural Language Processing with Transformers, Lewis Tunstall, Leandro von Werra, Thomas Wolf, 2022 (O'Reilly Media) - A practical guide to using and fine-tuning Transformer models for various natural language processing tasks, offering concrete examples and best practices.
Transfer Learning in Natural Language Processing: A Survey, Sebastian Ruder, Iain Stewart, Jeremy Howard, 2019Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)DOI: 10.18653/v1/D19-1587 - Provides a comprehensive overview of transfer learning techniques in natural language processing, including the pre-training and fine-tuning approach discussed.