Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS)DOI: 10.48550/arXiv.1706.03762 - The foundational paper introducing the Transformer architecture and the Query, Key, Value attention mechanism.
Speech and Language Processing (3rd Edition Draft), Daniel Jurafsky and James H. Martin, 2025 (Pearson) - An authoritative textbook providing a comprehensive explanation of attention mechanisms and Transformers, including the QKV abstraction. Relevant chapters include "Attention and Transformers."
Stanford CS224N: Natural Language Processing with Deep Learning, Diyi Yang, Tatsunori Hashimoto, 2023 (Stanford University) - Provides lecture videos, slides, and readings on attention mechanisms and the Transformer architecture, explaining the QKV framework from an educational perspective.