The mean-field approximation, where , simplifies VAE training considerably by assuming independence among the latent variables given the input . However, this assumption is often too restrictive. The true posterior might exhibit complex dependencies between latent dimensions, and forcing to be factorized can prevent it from accurately modeling these correlations. This discrepancy, often part of the "amortization gap," can limit the VAE's ability to learn rich representations and generate high-fidelity data. Structured variational inference offers a way to address this limitation by explicitly modeling dependencies within the approximate posterior.
Structured variational inference aims to enrich the family of distributions by allowing for correlations among the latent variables . Instead of a fully factorized form, is designed to capture some statistical structure. This allows to be a more accurate approximation of the true posterior , potentially leading to a tighter Evidence Lower Bound (ELBO) and improved model performance.
The core idea is to define using a model that can represent dependencies. Common approaches include autoregressive models and normalizing flows, both of which allow for flexible and expressive posterior distributions.
Comparison of mean-field and structured (autoregressive) approximate posteriors. In the mean-field case, latent variables are conditionally independent given . In the structured autoregressive case, depends on preceding (for ) and .
One powerful way to introduce structure is to model autoregressively. This means the distribution over the latent vector is factorized as a product of conditional distributions:
Here, denotes . Each conditional distribution can be parameterized by a neural network that takes and the previously sampled latent variables as input. For instance, if each is a Gaussian, its mean and standard deviation would be functions of and :
This structure allows to capture arbitrary dependencies, provided the conditioning neural networks are sufficiently expressive. Sampling from such a model is sequential: first sample , then , and so on. While calculating the density is straightforward (a product of terms), the sequential sampling can be slow if is large.
Techniques like Inverse Autoregressive Flows (IAFs), which you may recall from Chapter 3, provide a way to implement such expressive autoregressive models where sampling can be parallelized, significantly speeding up the process. In IAFs, is obtained by transforming a noise vector (where are independent) using an autoregressive transformation: .
Normalizing Flows, also discussed in Chapter 3 (Section 3.5 "Normalizing Flows for Flexible Priors and Posteriors"), offer another general and powerful framework for constructing complex posterior distributions. A normalizing flow transforms a simple base distribution (e.g., a standard multivariate Gaussian) through a sequence of invertible transformations :
The density of can be computed using the change of variables formula:
The parameters of these transformations (and potentially the base distribution ) are learned as part of , and are typically conditioned on . This allows to learn highly flexible distributions. The important point is that the transformations are designed such that their Jacobians (and thus the determinant) are computationally tractable. Examples include planar flows, radial flows, and more sophisticated flow architectures like RealNVP, MAF, and IAF.
Using normalizing flows for can significantly increase the expressiveness of the inference network, allowing it to better match the true posterior and thereby tighten the ELBO.
Adopting structured variational inference has several important consequences:
Improved ELBO and Model Quality: A more flexible can provide a tighter lower bound on the true log-likelihood . This often translates to better generative performance, such as sharper generated samples and higher likelihood scores on test data. The representations learned in the latent space may also become more meaningful as the inference network better captures the underlying data manifold.
Increased Computational Complexity: The primary trade-off is computational cost.
Model Design Choices: You now have more choices to make regarding the architecture of . For autoregressive models, this includes the ordering of latent variables and the architecture of the conditional networks. For normalizing flows, it involves selecting the type and number of flow layers. These choices can impact performance and computational load.
KL Divergence Term: The KL divergence in the ELBO might become more challenging. If is a standard Gaussian and is a complex distribution (e.g., from a normalizing flow), the KL divergence may no longer have an analytical solution. In such cases, it often needs to be estimated, for example, by sampling and calculating .
Structured variational inference is particularly beneficial when:
While introducing structure adds complexity, the potential gains in model expressiveness and performance often justify the additional overhead, especially for challenging datasets or when aiming for state-of-the-art results. The techniques discussed here, such as autoregressive models and normalizing flows for , are foundational for building more sophisticated and powerful VAEs. As we proceed, you'll see how these improved inference mechanisms can be combined with other advanced VAE components.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with