Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30DOI: 10.48550/arXiv.1706.03762 - The foundational paper introducing the Transformer architecture, including detailed descriptions of the decoder's masked self-attention, cross-attention, and feed-forward networks.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2025 (Stanford University) - An authoritative textbook with dedicated chapters explaining the Transformer architecture, covering the decoder layer and its attention mechanisms.