Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (Curran Associates, Inc.)DOI: 10.5591/978-1-57766-068-1.5998 - 这篇论文介绍了Transformer架构、自注意力机制和多头注意力机制,为后续序列处理模型奠定了基础。