Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - 介绍了Transformer架构,这是现代大型语言模型(包括经过微调的模型)的基础。
Transfer Learning in Natural Language Processing: A Survey, Sebastian Ruder, Iain Stewart, Jeremy Howard, 2019Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)DOI: 10.18653/v1/D19-1587 - 全面概述了自然语言处理中的迁移学习技术,包括本文讨论的预训练和微调方法。