Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture and the scaled dot-product attention mechanism, detailing its components and initial complexity considerations.
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, 2023 (Cambridge University Press) - Offers a comprehensive and accessible explanation of deep learning concepts, including detailed analysis of attention mechanisms and their computational properties.