BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1423 - This paper introduces BERT, a transformer-based model that significantly advanced the pre-training paradigm for natural language understanding tasks. It describes how models acquire general language knowledge through extensive pre-training.
Natural Language Processing with Transformers: Building Innovative Applications with Deep Learning Models, Lewis Tunstall, Leandro von Werra, and Thomas Wolf, 2022 (O'Reilly Media) - This book offers practical guidance on using the Hugging Face Transformers library, which gives access to a wide range of pre-trained models. It covers how to use these models for various NLP tasks, showing their accessibility and utility.