Chapter 4: Coupling Architectures

Autoregressive flows provide an effective way to model probability distributions, but they come with a structural limitation. Because they process data sequentially, they are typically fast in one direction and slow in the other. If a model generates data quickly, it will evaluate probability densities slowly.

To achieve fast performance during both the forward and inverse passes, we use coupling architectures. These designs partition the input data into two segments. The first segment passes through the layer unchanged. The model then uses this unmodified segment to parameterize the transformation of the second segment.

Let's take for example an input vector $x$ split into two halves, $x_1$ and $x_2$ . An affine coupling layer produces the output $y$ using the following operations:

$y_1 = x_1$

$y_2 = x_2 \odot \exp(s(x_1)) + t(x_1)$

Because the first half does not change, the Jacobian matrix of this transformation is triangular. This property allows you to calculate the determinant and compute the inverse almost instantly, regardless of how complex the neural networks $s$ and $t$ are.

In this chapter, you will learn how to build these computationally efficient models. You will examine the RealNVP architecture, focusing on how it uses checkerboard and channel masking to process different parts of an image. You will also look at multi-scale architectures that drop out variables at different stages to reduce computational costs. Following that, you will study the Glow model to see how activation normalization and $1 \times 1$ invertible convolutions improve training stability and performance.

By the end of this chapter, you will apply these mathematical principles in Python by building an affine coupling layer from scratch using PyTorch.

Sections

4.1 Affine Coupling Layers
4.2 The RealNVP Architecture
4.3 Multi-Scale Architectures
4.4 ActNorm and Invertible Convolutions
4.5 Implementing a Coupling Layer Practice