Introduction to Attention: Focusing on Relevant Information
Was this section helpful?
Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2014International Conference on Learning Representations (ICLR) 2015DOI: 10.48550/arXiv.1409.0473 - This seminal paper introduced the concept of attention mechanisms in sequence-to-sequence models, providing a fundamental understanding of how models can dynamically focus on relevant parts of an input sequence.
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015Proceedings of the 32nd International Conference on Machine Learning (ICML), Vol. 37DOI: 10.48550/arXiv.1502.03044 - This influential paper demonstrated the effective use of visual attention for image captioning, illustrating how models can focus on specific image regions while generating corresponding words.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NeurIPS 2017), Vol. 30DOI: 10.48550/arXiv.1706.03762 - This landmark paper introduced the Transformer architecture, entirely based on attention mechanisms (specifically self-attention), which has become foundational for many state-of-the-art models across various domains, including multimodal AI.