Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - 提出Transformer架构和Scaled Dot-Product Attention机制的开创性论文。
The Annotated Transformer, Alexander Rush, 2018 - 一份广泛引用的PyTorch实现和Transformer模型的详细说明,包含Scaled Dot-Product Attention。
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alex Smola, 2021 (Cambridge University Press) - 一本权威的交互式深度学习书籍,全面介绍注意力机制和Transformer架构,并提供可执行代码。