Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - 这篇基础论文介绍了Transformer架构,详细说明了解码器的组成及其功能,包括带掩码的自注意力机制和编码器-解码器注意力机制。