BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2018Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Vol. 1DOI: 10.48550/arXiv.1810.04805 - 介绍了BERT模型,这是一种基础的预训练Transformer,彻底改变了自然语言处理,并展示了预训练和微调范式的强大功能,这正是使用模型库的核心。