In the previous chapter, we constructed basic normalizing flows by stacking simple mathematical transformations. While functional, planar and radial flows often struggle to scale to high-dimensional distributions. To address this limitation, we turn to autoregressive models.
Autoregressive models factorize a joint probability distribution into a product of conditional probabilities. Mathematically, the joint probability of an -dimensional variable is written as:
By enforcing this structure, we ensure a valid probability distribution and naturally create a lower-triangular Jacobian matrix. A lower-triangular Jacobian makes the determinant calculation simple and highly efficient, reducing the time complexity to instead of .
This chapter focuses on the design and implementation of autoregressive flows. You will examine the Masked Autoencoder for Distribution Estimation (MADE) architecture, which uses binary masks to enforce the autoregressive property without relying on slow sequential loops. From there, you will compare two standard architectures: Masked Autoregressive Flow (MAF) and Inverse Autoregressive Flow (IAF). MAF optimizes for fast density evaluation, making it highly effective for exact maximum likelihood estimation. IAF optimizes for fast sampling speeds, making it better suited for generating synthetic data.
Finally, you will apply these methods by writing Python and PyTorch code to implement a masked linear layer, forming the basis of your own autoregressive flow model.
3.1 Autoregressive Generative Models
3.2 Masked Autoencoders for Distributions
3.3 Masked Autoregressive Flow
3.4 Inverse Autoregressive Flow
3.5 Building Autoregressive Models Practice