Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.48550/arXiv.1706.03762 - This foundational paper introduced the Transformer architecture and its attention mechanisms, including the detailed design and mathematical formulation of encoder-decoder cross-attention.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky and James H. Martin, 2025 - This comprehensive textbook offers a clear explanation of the Transformer architecture, with a dedicated section on encoder-decoder attention and its role in sequence-to-sequence models.