Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017arXiv, Vol. 30DOI: 10.48550/arXiv.1706.03762 - The foundational paper that introduced the Transformer architecture, detailing self-attention, positional encoding, and the encoder-decoder design.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2025 (Stanford University) - A comprehensive and authoritative textbook on natural language processing, providing detailed explanations of the Transformer architecture and its components.