Sampling Procedures

Optimizing the parameters of a normalizing flow using exact maximum likelihood estimation enables the model to generate synthetic data. Generating data with a flow model relies on the exact mathematical inverse of the training process. Training involves passing data through forward transformations to calculate exact densities, while generation requires reversing this flow.

To generate a new data point $x$ , we first draw a sample $z$ from a known base distribution $P_Z$ . This is usually a standard normal distribution.

$z \sim \mathcal{N}(0, I)$

Next, we pass this sample through the inverse of the learned transformation $f$ .

$x = f^{-1}(z)$

Because the functions in a normalizing flow are strictly invertible, every generated point $x$ maps perfectly back to a specific latent representation $z$ . The architecture guarantees that no information is lost during this mapping.

Sequence of operations reversing the forward transformation to generate data from a base distribution.

Implementing the Inverse Pass

When stacking multiple flow layers, the inverse pass must strictly reverse the order of operations. If the forward pass applies transformations in the order $f = f_N \circ \dots \circ f_2 \circ f_1$ , the inverse pass must apply them as $f^{-1} = f_1^{-1} \circ f_2^{-1} \circ \dots \circ f_N^{-1}$ .

Implementing this in PyTorch requires iterating through your stored layers in reverse. Each layer must define its own mathematical inverse function.

import torch
import torch.nn as nn

class NormalizingFlow(nn.Module):
    def __init__(self, layers):
        super().__init__()
        # ModuleList holds the individual flow transformations
        self.layers = nn.ModuleList(layers)

    def forward(self, x):
        # Used for density estimation (training)
        log_det_jacobians = torch.zeros(x.shape[0], device=x.device)
        for layer in self.layers:
            x, ldj = layer.forward(x)
            log_det_jacobians += ldj
        return x, log_det_jacobians

    def inverse(self, z):
        # Used for sampling (generation)
        x = z
        log_det_jacobians = torch.zeros(z.shape[0], device=z.device)

        # Iterate backwards for the inverse pass
        for layer in reversed(self.layers):
            x, ldj = layer.inverse(x)
            log_det_jacobians += ldj

        return x, log_det_jacobians

To execute the sampling procedure, you simply instantiate a tensor of random noise and pass it to the inverse method.

def generate_samples(flow_model, num_samples, latent_dim, device="cpu"):
    flow_model.eval()
    with torch.no_grad():
        # Draw from the base distribution
        z = torch.randn(num_samples, latent_dim, device=device)

        # Map to the data distribution
        generated_data, _ = flow_model.inverse(z)

    return generated_data

Architecture Impacts on Sampling Speed

Different flow architectures handle the inverse pass with varying levels of efficiency. When designing a system, the choice of architecture dictates whether your model will be suitable for real-time generation.

In autoregressive models like the Masked Autoregressive Flow (MAF), the forward pass is highly parallelized. This makes density estimation and training very fast. However, the inverse pass requires generating features sequentially. Generating high-dimensional data, such as images or audio, with MAF will be extremely slow because each pixel or waveform step depends on the previous one being fully computed.

Conversely, Inverse Autoregressive Flow (IAF) is designed specifically for fast generation. The inverse pass is parallelized, while the forward density estimation becomes sequential and slow.

Coupling architectures like RealNVP offer a balanced approach. Because affine coupling layers split the input dimensions and apply simple element-wise operations, both the forward and inverse passes operate in parallel. This property makes coupling models highly efficient for generating large volumes of data while remaining fast to train.

Temperature Scaling

A common practical technique when generating samples from generative models is temperature scaling. Instead of sampling directly from a standard normal distribution, you multiply the sampled latent variables by a scalar parameter $T$ , where $0 < T \le 1$ .

$z = T \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$

Reducing the temperature concentrates the initial samples closer to the mean of the base distribution, avoiding the low-probability tails. In practice, this usually results in generated data that looks more realistic and has fewer artifacts. The trade-off is reduced diversity in the generated outputs.

If your trained flow model generates noisy or out-of-distribution samples during inference, lowering the temperature to 0.8 or 0.7 is a standard troubleshooting step.

def generate_with_temperature(flow_model, num_samples, latent_dim, temperature=0.8, device="cpu"):
    flow_model.eval()
    with torch.no_grad():
        # Apply temperature scaling to the base distribution samples
        epsilon = torch.randn(num_samples, latent_dim, device=device)
        z = epsilon * temperature

        generated_data, _ = flow_model.inverse(z)

    return generated_data

By controlling the sampling temperature, you can balance the trade-off between the fidelity and the variety of your generated samples depending on your specific application requirements.

Was this section helpful?