Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.5555/3295222.3295349 - 这篇基础论文介绍了Transformer架构和自注意力机制,详细说明了查询、键和值如何从单个输入序列中导出。
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola, 2023 (Cambridge University Press) - 以易于理解的教科书形式,详细阐述了注意力机制,包括自注意力及其数学公式。