Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NeurIPS 2017)DOI: 10.48550/arXiv.1706.03762 - 引入了Transformer架构,该架构完全依赖自注意力(多头注意力)并改变了序列建模。