实现缩放点积注意力

这部分内容有帮助吗？

参考文献

Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017 Advances in Neural Information Processing Systems 30 (NIPS 2017) DOI: 10.48550/arXiv.1706.03762 - 提出Transformer架构和Scaled Dot-Product Attention机制的开创性论文。
The Annotated Transformer, Alexander Rush, 2018 - 一份广泛引用的PyTorch实现和Transformer模型的详细说明，包含Scaled Dot-Product Attention。
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alex Smola, 2021 (Cambridge University Press) - 一本权威的交互式深度学习书籍，全面介绍注意力机制和Transformer架构，并提供可执行代码。
MultiheadAttention, PyTorch Authors, 2024 (PyTorch Foundation) - PyTorch官方MultiheadAttention模块的文档，它内部使用Scaled Dot-Product Attention，提供了其在实际应用和参数方面的说明。