Auto-Encoding Variational Bayes, Diederik P Kingma, Max Welling, 2013International Conference on Learning Representations (ICLR 2014)DOI: 10.48550/arXiv.1312.6114 - Introduces the Variational Autoencoder (VAE) framework, reparameterization trick, and the ELBO objective, forming the foundation for all VAE applications including NLP.
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017), Vol. 30DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture, which heavily relies on self-attention mechanisms and has become a standard for both encoder and decoder components in modern NLP VAEs.