Chapter 5: Training Flow Models

Previous chapters focused on constructing the architectures of normalizing flows. From autoregressive models to affine coupling layers, you now have the structural components necessary to map simple probability distributions to highly complex ones. The next step is optimizing these neural networks to fit actual data.

This chapter details the mechanics of training flow models. Unlike many other generative methods that rely on approximations, normalizing flows allow for exact density evaluation. This mathematical property means we can train these models directly using exact maximum likelihood estimation. We will formulate the objective function by minimizing the negative log-likelihood loss. Mathematically, for a data point $x$ , an invertible transformation $f$ , and a base distribution $Z$ , the exact log-likelihood is calculated as:

$\log p_X(x) = \log p_Z(f(x)) + \log \left| \det \frac{\partial f(x)}{\partial x} \right|$

You will learn how to implement custom loss functions in PyTorch that compute this equation by combining the density of the base distribution with the Jacobian determinants.

Because normalizing flows operate strictly on continuous probability distributions, applying them to discrete data formats like digital images requires specific preprocessing steps. We will implement data dequantization methods to add continuous noise to discrete values, preventing the model from assigning infinite density to distinct points. You will program both uniform and variational dequantization techniques to properly convert discrete input into continuous signals for the network.

After covering the optimization process, we will shift focus to data generation. You will write the necessary sampling procedures, executing the inverse mathematical passes to generate new synthetic samples from the learned distribution. Finally, you will tie these components together by training a complete continuous normalizing flow model on a standard dataset and evaluating the generated outputs.

Sections

5.1 Maximum Likelihood Estimation
5.2 Loss Functions for Density Estimation
5.3 Data Dequantization Methods
5.4 Sampling Procedures
5.5 Training a Normalizing Flow Practice