Universal Language Model Fine-tuning for Text Classification, Jeremy Howard and Sebastian Ruder, 2018Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/P18-1031 - 提出了ULMFit,一种针对NLP任务的三阶段迁移学习方法,使用预训练语言模型,在有限任务特定数据下展现了性能提升。
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017arXiv preprint arXiv:1706.03762DOI: 10.48550/arXiv.1706.03762 - 介绍了Transformer架构,它推动了序列建模的发展,并成为现代大型语言模型的基础。
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1423 - 介绍了BERT,一个模型,它通过掩码语言建模将预训练-微调方法应用于深度双向Transformer,用于多种自然语言处理任务。