Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (NeurIPS)DOI: 10.5555/3295222.3295349 - 这篇奠基性论文介绍了Transformer架构,详细阐述了编码器和解码器堆栈、多头注意力机制、残差连接和层归一化。