Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017arXivDOI: 10.48550/arXiv.1706.03762 - The foundational paper that introduced the Transformer architecture, multi-head attention, and the encoder-decoder structure, which are central to this section.
Speech and Language Processing (3rd Edition Draft), Daniel Jurafsky and James H. Martin, 2023 (Pearson) - A comprehensive textbook chapter offering detailed explanations of the Transformer architecture, attention mechanisms, and their applications in natural language processing.
Custom layers and models, fchollet, 2023 (tensorflow.org) - Official Keras guide on creating custom layers and models by subclassing tf.keras.layers.Layer, which is essential for building custom architectures like the Transformer blocks discussed.