Stacking multiple invertible functions creates highly expressive probability distributions. To make this work in practice, specific mathematical functions are required that are both invertible and have easily computable Jacobian determinants. Two foundational architectures that meet these strict requirements are planar flows and radial flows. These models provide an excellent starting point for understanding how to warp simple probability densities into complex geometric shapes.
A planar flow modifies the input space by applying a transformation along a specific hyperplane. You can think of it as taking a flat sheet of rubber and stretching or compressing it along a single straight line. Everything perpendicular to that line remains relatively unchanged, while the space along the line expands or contracts.
Let be our continuous input vector. The planar flow transformation is defined mathematically as:
Here, and are learnable parameter vectors that define the orientation and scale of the transformation. The term is a scalar bias that shifts the hyperplane. The function is a smooth, differentiable non-linear activation function. The hyperbolic tangent function, denoted as , is most commonly used for because its derivative is well-behaved and bounded.
To use this transformation in a normalizing flow, we must compute the determinant of its Jacobian matrix. Taking the derivative of with respect to the input vector yields:
This result is an identity matrix plus the outer product of two vectors, scaled by the derivative of the activation function. We can compute the determinant of this specific matrix structure highly efficiently using the matrix determinant lemma. The lemma states that . Applying this linear algebra identity to our Jacobian gives a much simpler scalar equation:
This scalar result is exceptionally computationally cheap. It evaluates in time instead of the standard time required to compute a general matrix determinant. This efficiency allows us to stack hundreds of planar flow layers without prohibitive computational costs during training.
Computational graph of a single planar flow transformation showing the skip connection adding the original input to the scaled activation.
For the flow to be valid, the transformation must be strictly invertible. This requires that the Jacobian determinant never equals zero and maintains a consistent sign. In practice, we constrain the parameters to ensure that . When using the activation function, its derivative is strictly bounded between and . We enforce invertibility by modifying the vector slightly during the forward pass to satisfy the geometric condition .
While planar flows apply transformations along a straight hyperplane, radial flows introduce distortions radiating outward from a specific center point. You can imagine a radial flow as placing a magnifying glass over a specific coordinate in the space. The space is either stretched outward from that point or compressed inward toward it.
For an input vector , the radial flow transformation is defined as:
Here, is the learnable reference point or center of the flow. The scalar represents the Euclidean distance between the input and the reference point. The parameters and dictate the spread and magnitude of the distortion. The function is designed to decay as the distance increases, typically defined as .
Putting these components together gives the explicit formulation:
The Jacobian determinant for a radial flow also benefits from a specialized computation that avoids building the full matrix. Using similar matrix identities, the determinant is evaluated as:
Just like planar flows, this determinant calculation operates in linear time . To guarantee invertibility for a radial flow, we mathematically restrict the parameter such that .
Probability density function before and after applying an invertible transformation layer.
A single planar or radial layer is quite limited in what it can model. A single planar layer only compresses or expands along one direction, and a single radial layer only warps around a single point. To approximate highly irregular probability densities, we apply these transformations sequentially. By chaining layers together, the overall transformation becomes heavily non-linear and fully capable of mapping a simple isotropic base distribution into highly complex target shapes.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•