Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 (Springer) - Widely recognized machine learning textbook with strong sections on Bayesian methods, information theory, and variational inference.
Variational Inference: A Review for Statisticians, David M. Blei, Alp Kucukelbir, Jon D. McAuliffe, 2017Journal of the American Statistical Association, Vol. 112 (Taylor & Francis)DOI: 10.1080/01621459.2017.1285773 - Review article offering an overview of variational inference, an important application of KL divergence in Bayesian machine learning.