Auto-Encoding Variational Bayes, Diederik P Kingma, Max Welling, 2013International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1312.6114 - Foundational paper on Variational Autoencoders (VAEs) and the reparameterization trick.
Effective Approaches to Attention-based Neural Machine Translation, Thang Luong, Hieu Pham, Christopher D. Manning, 2015Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics)DOI: 10.18653/v1/D15-1166 - Compares different attention mechanisms, including multiplicative attention (Luong-style), for neural machine translation.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NeurIPS 2017)DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture, which relies solely on self-attention (multi-head attention) and transformed sequence modeling.