Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30, Vol. 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - 介绍Transformer架构和多头注意力机制的开创性论文,阐述了其设计理念以及从不同表征子空间处理信息的原理。
Dive into Deep Learning, Aston Zhang, Zachary C. Lipton, Mu Li, and Alex Smola, 2024 (Cambridge University Press) - 一份开源的教育资源,清晰深入地解释了多头注意力机制,强调其在模型中同时关注输入不同方面的作用。