Auto-Encoding Variational Bayes, Diederik P Kingma, Max Welling, 2014International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1312.6114 - Original paper introducing the Variational Autoencoder, detailing the ELBO objective and the role of KL divergence.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - Authoritative textbook with a dedicated chapter on Variational Autoencoders, explaining the ELBO and KL regularization in depth.
Elements of Information Theory, Thomas M. Cover, Joy A. Thomas, 2006 (Wiley-Interscience)DOI: 10.1002/0471742716 - Standard textbook providing a comprehensive foundation in information theory, including the definition and properties of Kullback-Leibler divergence.
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, Irina Higgins, Loïc Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Alexander Pritzel, Matthew M. Botvinick, Daan Wierstra, Karen Ullrich, David Rezende, 2017International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1804.03599 - Research paper introducing a variant of VAEs that explicitly controls the strength of the KL divergence term, addressing the balance between reconstruction and regularization.