Having established the theoretical underpinnings and interpretation of the Fréchet Inception Distance (FID) in the previous sections, we now turn to the practical aspects of calculating this important metric. While the concept involves comparing distributions in a feature space, the actual computation requires specific steps involving a pre-trained model and statistical calculations. This section provides a hands-on guide to computing the FID score.
The FID Calculation Workflow
Calculating the FID score involves comparing the statistics of activations produced by the Inception V3 model for a set of real images versus a set of generated images. The core steps are:
- Prepare Datasets: Gather a representative set of real images from your target domain and generate a set of fake images using your trained generator. A common practice is to use at least 10,000 images for each set, although 50,000 is often preferred for more stable scores. Ensure both sets undergo identical preprocessing.
- Load Pre-trained Inception V3: Utilize an Inception V3 model pre-trained on the ImageNet dataset. Specifically, you need the model up to a certain activation layer, typically the final average pooling layer (outputting 2048 features).
- Extract Activations: Feed both the real and generated images through the (truncated) Inception V3 model. Collect the 2048-dimensional activation vectors produced by the chosen layer for every image in both sets.
- Calculate Statistics: Compute the mean vector (μ) and the covariance matrix (Σ) for the activations of the real images (μr,Σr) and the generated images (μg,Σg).
- Compute FID: Apply the Fréchet distance formula using the calculated statistics.
The FID score is calculated using the following formula:
FID=∣∣μr−μg∣∣2+Tr(Σr+Σg−2(ΣrΣg)1/2)
where:
- μr and μg are the mean vectors of the activations for real and generated images, respectively.
- Σr and Σg are the covariance matrices of the activations for real and generated images.
- ∣∣⋅∣∣2 denotes the squared Euclidean norm (sum of squared differences).
- Tr(⋅) is the trace of a matrix (sum of diagonal elements).
- (ΣrΣg)1/2 is the matrix square root of the product of the covariance matrices.
Workflow for calculating the Fréchet Inception Distance (FID).
Implementation Details
While you could implement this process from scratch using a deep learning framework (like PyTorch or TensorFlow) to load Inception V3 and NumPy/SciPy for the statistical calculations, several well-maintained libraries simplify this considerably. Libraries such as pytorch-fid
or tensorflow-gan
(TFGAN) often provide convenient command-line tools or functions.
Let's outline the core computational steps using NumPy and SciPy, assuming you have already obtained the activation arrays act_r
(for real images) and act_g
(for generated images), each with shape (N, 2048)
, where N
is the number of images.
import numpy as np
from scipy.linalg import sqrtm
# Assume act_r and act_g are NumPy arrays of shape (N, 2048)
# containing Inception activations for real and generated images.
# 1. Calculate Mean and Covariance
mu_r = np.mean(act_r, axis=0)
mu_g = np.mean(act_g, axis=0)
sigma_r = np.cov(act_r, rowvar=False)
sigma_g = np.cov(act_g, rowvar=False)
# 2. Calculate Squared Euclidean distance between means
diff_mean_sq = np.sum((mu_r - mu_g)**2)
# 3. Calculate Matrix Square Root of Product
# Add small identity matrix for numerical stability if needed
epsilon = 1e-6
covmean_sqrt, _ = sqrtm(sigma_r.dot(sigma_g) + epsilon * np.eye(sigma_r.shape[0]), disp=False)
# Check for complex numbers (indicates numerical instability)
if np.iscomplexobj(covmean_sqrt):
print("Warning: Complex numbers generated in matrix square root. Using real part.")
covmean_sqrt = covmean_sqrt.real
# 4. Calculate Trace Term
trace_term = np.trace(sigma_r + sigma_g - 2.0 * covmean_sqrt)
# 5. Compute FID
fid_score = diff_mean_sq + trace_term
print(f"Calculated FID Score: {fid_score}")
Practical Considerations
- Number of Samples: Using too few samples (e.g., less than 10,000) leads to noisy and unreliable FID scores. The covariance matrix estimation, in particular, requires sufficient data.
- Preprocessing: It is absolutely essential that both real and generated images are preprocessed identically before being fed into Inception V3. This typically involves resizing images to 299x299 pixels and normalizing pixel values according to the requirements of the pre-trained Inception model (often scaling to
[-1, 1]
or [0, 1]
and potentially subtracting ImageNet means). Any discrepancy here will invalidate the comparison.
- Pre-computed Statistics: For standard datasets like CIFAR-10, CelebA, or LSUN, the mean and covariance statistics (μr,Σr) for the real images are often pre-calculated and made available. Using these standard statistics ensures comparability across different research papers, provided you use the same Inception V3 checkpoint and preprocessing.
- Numerical Stability: Calculating the matrix square root (ΣrΣg)1/2 can sometimes be numerically unstable, especially if covariance matrices are ill-conditioned (e.g., close to singular, which can happen with insufficient samples or low feature variance). Adding a small multiple of the identity matrix (epsilon∗I) before the square root calculation, as shown in the example code, can help mitigate this. Always check if the result contains complex numbers, which signals a potential issue.
- Library Usage: Using established libraries is generally recommended. They handle details like correctly loading the specific Inception V3 weights, performing the appropriate truncation, handling batch processing for memory efficiency, and incorporating numerical stability fixes.
By following these steps and considerations, you can reliably calculate the FID score, providing a valuable quantitative measure for comparing your GAN's performance against baselines or tracking improvements during training. Remember that FID primarily measures the similarity of the generated distribution to the real distribution in the Inception feature space, capturing aspects of both sample fidelity and diversity.