Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing SystemsDOI: 10.48550/arXiv.1706.03762 - 这篇论文介绍了Transformer架构和缩放点积注意力机制,是本文讨论内容的基础。
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, 2024 (Cambridge University Press) - 一本开源互动式深度学习书籍,详细解释了注意力评分函数和点积注意力的数学公式。