Self-Attention: Queries, Keys, Values from the Same Source
Was this section helpful?
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.5555/3295222.3295349 - This foundational paper introduced the Transformer architecture and the self-attention mechanism, detailing how queries, keys, and values are derived from a single input sequence.
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola, 2023 (Cambridge University Press) - Provides a detailed explanation of attention mechanisms, including self-attention and its mathematical formulation, in an accessible textbook format.