Calculating the posterior distribution p(z∣x) is often computationally challenging due to the intractable integral required for the evidence p(x). While Markov Chain Monte Carlo methods, discussed previously, provide a way to sample from the posterior, they can be computationally intensive, especially for large datasets or complex models.
This chapter introduces Variational Inference (VI), an alternative family of techniques for approximating posterior distributions. Instead of sampling, VI reframes Bayesian inference as an optimization problem. The goal is to find a distribution q(z) from a tractable family that is closest to the true posterior p(z∣x), typically by maximizing a lower bound on the model evidence, known as the Evidence Lower Bound (ELBO): L(q)=Eq(z)[logp(x,z)]−Eq(z)[logq(z)]
We will start by formulating inference as optimization and deriving the ELBO. You will learn about the common mean-field approximation and the Coordinate Ascent Variational Inference (CAVI) algorithm. We will then cover techniques for scaling inference to large datasets using Stochastic Variational Inference (SVI) and handle models with complex gradients using Black Box Variational Inference (BBVI). We will also look at more expressive variational families and analyze the trade-offs between VI and MCMC approaches regarding speed, accuracy, and scalability. Practical implementation using common libraries will solidify these concepts.
3.1 Optimization as Inference: The VI Perspective
3.2 Deriving the Evidence Lower Bound (ELBO)
3.3 Mean-Field Approximation Details
3.4 Coordinate Ascent Variational Inference (CAVI)
3.5 Stochastic Variational Inference (SVI) for Large Data
3.6 Black Box Variational Inference (BBVI)
3.7 Advanced Variational Families
3.8 Comparing MCMC and VI Strengths
3.9 Hands-on Practical: Scalable Variational Inference
© 2025 ApX Machine Learning