Forward and Inverse Passes

Normalizing flows possess a highly useful dual nature. Because the mathematical transformations defining the flow are strictly invertible, we can push data through the model in two different directions. Each direction serves a distinct and important purpose in machine learning. We define these two directions as the forward pass and the inverse pass.

The forward pass maps complex data points to a simple latent space. We use this direction primarily to evaluate the exact probability density of our data. When we train a normalizing flow model, we want to maximize the likelihood of our observed training data. To do this, we take a data point $x$ and pass it through our transformation function $f$ to obtain a latent variable $z$ .

$z = f(x)$

Once we have $z$ , we calculate its probability under our simple base distribution, which is typically a standard Gaussian. We then apply the change of variables theorem to account for the stretching and squishing of space caused by $f$ . The exact log-probability of the data point $x$ is computed using the log-determinant of the Jacobian matrix.

$\log p(x) = \log p(z) + \log \left| \det \frac{\partial f(x)}{\partial x} \right|$

By executing this forward pass, we obtain the exact density value required to compute our loss function and update the model weights. The ability to calculate this exact density, rather than an approximation, is what separates normalizing flows from other generative models like Generative Adversarial Networks or Variational Autoencoders.

The bidirectional mapping between the data space and the latent space in a normalizing flow model.

The inverse pass operates in the exact opposite direction. It maps values from the simple latent space back into the complex data space. This direction is used for generative modeling, allowing us to sample completely new, synthetic data points from the learned distribution.

To generate a new data point, we first sample a random value $z$ from our base distribution. We then apply the inverse function $f^{-1}$ to this latent variable.

$x = f^{-1}(z)$

Because the model was trained to map real data into the base distribution, passing a random sample from the base distribution through the inverse function yields a value that closely resembles the original training data.

The performance characteristics of a normalizing flow depend heavily on how these two passes are implemented. In some architectures, the mathematical operations required for the forward pass are incredibly fast, making density estimation and training highly efficient. However, the inverse pass for those same models might require sequential, iterative calculations, resulting in slow sampling times. Other architectures reverse this trade-off, prioritizing fast inverse passes for rapid data generation at the cost of slower training.

When writing PyTorch code for a flow layer, you will typically define a single module that contains both a forward method and an inverse method. Both methods must track and return the log-determinant of the Jacobian for their respective transformations. The forward method calculates the Jacobian of $f$ , while the inverse method calculates the Jacobian of $f^{-1}$ . By keeping both directions carefully aligned, you ensure the mathematical guarantees of the change of variables theorem remain intact throughout the entire model hierarchy.

Was this section helpful?