Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NIPS 2017)DOI: 10.48550/arXiv.1706.03762 - 介绍Transformer架构和多头注意力机制的原始论文,详细说明了其机制和优势。
The Annotated Transformer, Alexander Rush, Vincent Nguyen, Guillaume Klein, 2018 - 对Transformer模型进行的全面逐行解释和PyTorch实现,包含对多头注意力的清晰阐述。
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alexander J. Smola, 2024 (Cambridge University Press) - 一本开源交互式教科书,为深度学习概念提供了详细的解释和代码示例,其中包含专门的多头注意力章节。