Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017NeurIPSDOI: 10.48550/arXiv.1706.03762 - 这篇基础论文介绍了Transformer架构,详细阐述了其编码器-解码器堆栈、多头注意力机制、残差连接和层归一化。