Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.5591/978-1-57766-508-2.6 - 介绍了Transformer架构,这是大多数现代大型语言模型的基础组成部分。