Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NeurIPS 2017), Vol. 30 (Curran Associates, Inc.)DOI: 10.55982/annips.2017.387 - 介绍了 Transformer 架构,该架构完全依赖自注意力机制并消除了循环,从而在序列建模方面取得了重大进展。