Calculating Jacobians Practice

Calculating the determinant of a Jacobian matrix is a required operation when training normalizing flows, as it tracks how probability volume expands or contracts during transformations. This process involves implementing the mathematical foundations of the change of variables theorem using PyTorch.

In practice, calculating analytical derivatives for complex neural network layers by hand is error-prone. PyTorch provides automatic differentiation capabilities that allow us to compute exact Jacobians programmatically. We will use the torch.autograd.functional.jacobian function to evaluate the partial derivatives of our transformations.

Computing the Jacobian of a 1D Transformation

Let us start with a straightforward affine transformation. For example, a scalar function defined as:

$f(x) = 3x + 2$

The derivative of this function with respect to $x$ is 3. Since this is a 1D transformation, the Jacobian matrix is a $1 \times 1$ matrix, and its determinant is simply the value 3. We can verify this using PyTorch.

import torch
from torch.autograd.functional import jacobian

def affine_transform(x):
    return 3 * x + 2

# Define a scalar input tensor and require gradients
x = torch.tensor([1.0], requires_grad=True)

# Compute the Jacobian matrix
J = jacobian(affine_transform, x)
print("Jacobian matrix:\n", J)

# Compute the determinant
det_J = torch.det(J)
print("Determinant:", det_J.item())

Running this code yields a $1 \times 1$ matrix containing the value 3.0, and the determinant calculation matches exactly.

Scaling to Multivariate Transformations

Normalizing flows operate on multivariate data, such as images or continuous time-series vectors. The Jacobian becomes a square matrix where each element represents the partial derivative of an output dimension with respect to an input dimension.

For example, a 2D nonlinear function mapping an input vector $x = [x_1, x_2]$ to an output vector $y = [y_1, y_2]$ :

$y_1 = 2x_1$

$y_2 = x_1^2 + 3x_2$

The mathematical formulation for the Jacobian matrix $J$ of this transformation is:

$J = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} \end{bmatrix} = \begin{bmatrix} 2 & 0 \\ 2x_1 & 3 \end{bmatrix}$

Because this is a lower triangular matrix, the determinant is the product of its diagonal elements, which is $2 \times 3 = 6$ . The value of $x_1$ does not affect the determinant in this specific architecture. Let us compute this in PyTorch.

def nonlinear_transform(x):
    y1 = 2 * x[0]
    y2 = x[0]**2 + 3 * x[1]
    return torch.stack([y1, y2])

# Define a 2D input vector
x_2d = torch.tensor([4.0, 1.0], requires_grad=True)

# Compute the Jacobian matrix
J_2d = jacobian(nonlinear_transform, x_2d)
print("2D Jacobian matrix:\n", J_2d)

# Compute the determinant
det_J_2d = torch.det(J_2d)
print("Determinant:", det_J_2d.item())

The output matrix will be a tensor equivalent to [[2.0, 0.0], [8.0, 3.0]], and the determinant evaluates to 6.0.

Working with Log-Determinants for Stability

When chaining multiple transformations to build deep normalizing flows, we must multiply the determinants of their respective Jacobians. Multiplying many small or large numbers together often leads to numerical underflow or overflow in floating-point representations.

To resolve this, we compute the natural logarithm of the absolute value of the determinant. By operating in log-space, multiplications become additions, which are significantly more stable mathematically. The change of variables formula for the log-density updates accordingly:

$\log p(x) = \log p(z) + \log \left| \det \frac{\partial z}{\partial x} \right|$

PyTorch provides torch.linalg.slogdet, which computes the sign and the natural logarithm of the absolute value of the determinant of a square matrix. This avoids computing the raw determinant entirely, preventing overflow before it occurs.

def compute_log_det_jacobian(func, x):
    # Calculate the Jacobian
    J = jacobian(func, x)

    # Calculate the sign and log absolute determinant
    sign, log_abs_det = torch.linalg.slogdet(J)

    return log_abs_det

log_det = compute_log_det_jacobian(nonlinear_transform, x_2d)
print("Log-Determinant:", log_det.item())

For our previous example where the determinant was 6.0, the log-determinant evaluates to approximately 1.7917.

Process flow for computing the log-determinant of a Jacobian matrix in PyTorch using automatic differentiation and slogdet for numerical stability.

Limitations of Exact Jacobian Calculations

While torch.autograd.functional.jacobian is helpful for understanding the mechanics and verifying implementations, it is generally too slow for training large machine learning models. The function computes the matrix by evaluating the backward pass multiple times, scaling linearly with the output dimension size. For a 1000-dimensional input, evaluating the full dense Jacobian requires 1000 backward passes.

To make normalizing flows computationally feasible, we design specific neural network architectures where the Jacobian matrix is intentionally structured to be either diagonal or triangular. As demonstrated in our 2D example, the determinant of a triangular matrix is just the product of its diagonal elements. The log-determinant is therefore the sum of the logs of the absolute diagonal elements.

$\log |\det(J)| = \sum_{i} \log |J_{ii}|$

By enforcing this structure, we bypass the need to compute the full dense Jacobian matrix entirely. The determinant can be evaluated in a single forward or backward pass, reducing the computational complexity from $O(D^3)$ to $O(D)$ , where $D$ is the dimensionality of the data. You will implement these highly efficient structures, such as autoregressive networks and coupling layers, in the upcoming chapters.

Was this section helpful?