Now that we've discussed the Fréchet Inception Distance (FID) as a valuable metric for assessing the quality and diversity of synthetic images, let's put theory into practice. This hands-on section will guide you through calculating the FID score between a set of real images and a set of synthetically generated images using Python.
FID measures the similarity between two image distributions by comparing the statistics of features extracted by a pre-trained neural network, typically the Inception v3 model. A lower FID score indicates that the distribution of synthetic images is closer to the distribution of real images, suggesting better quality and diversity.
Before we begin, ensure you have the necessary libraries installed. We'll primarily use TensorFlow for loading the Inception v3 model and performing computations, NumPy for numerical operations, and potentially SciPy for matrix calculations if implementing the formula directly.
pip install tensorflow numpy scipy Pillow
You will also need two sets of images:
For this example, let's assume you have these images stored in path/to/real/images
and path/to/synthetic/images
, respectively.
The Inception v3 model expects input images of a specific size (typically 299x299 pixels) and preprocessed in a particular way (pixel values scaled to the range [-1, 1]). We need to ensure both our real and synthetic images conform to these requirements.
import tensorflow as tf
import numpy as np
import os
from PIL import Image
import warnings
# Suppress specific TensorFlow warnings for cleaner output
warnings.filterwarnings("ignore", category=FutureWarning)
tf.get_logger().setLevel('ERROR')
# Define image size expected by Inception v3
IMAGE_SIZE = (299, 299)
def preprocess_image(image_path):
"""Loads and preprocesses an image for Inception v3."""
try:
img = tf.keras.preprocessing.image.load_img(
image_path, target_size=IMAGE_SIZE
)
img_array = tf.keras.preprocessing.image.img_to_array(img)
# Scale pixel values to [-1, 1] as expected by InceptionV3
img_array = tf.keras.applications.inception_v3.preprocess_input(img_array)
return img_array
except Exception as e:
print(f"Warning: Skipping file {image_path} due to error: {e}")
return None
def load_and_preprocess_images(dir_path, max_images=None):
"""Loads and preprocesses all images from a directory."""
image_paths = [os.path.join(dir_path, fname) for fname in os.listdir(dir_path)
if fname.lower().endswith(('.png', '.jpg', '.jpeg'))]
if max_images is not None:
image_paths = image_paths[:max_images]
print(f"Processing a maximum of {max_images} images from {dir_path}")
processed_images = []
for path in image_paths:
processed = preprocess_image(path)
if processed is not None:
processed_images.append(processed)
if not processed_images:
raise ValueError(f"No valid images found or processed in directory: {dir_path}")
return np.array(processed_images)
# --- Placeholder paths: Replace with your actual directories ---
# It's recommended to use at least a few thousand images for stable FID.
# For demonstration, we might use fewer, but be aware results will vary.
PATH_REAL_IMAGES = 'path/to/real/images'
PATH_SYNTHETIC_IMAGES = 'path/to/synthetic/images'
MAX_IMAGES_PER_SET = 100 # Use a small number for quick demo; increase for real evaluation
print("Loading and preprocessing real images...")
# Add error handling for directory existence
if not os.path.isdir(PATH_REAL_IMAGES):
print(f"Error: Real image directory not found: {PATH_REAL_IMAGES}")
print("Please replace 'path/to/real/images' with the correct path.")
# Set real_images to None or handle appropriately
real_images = None
else:
real_images = load_and_preprocess_images(PATH_REAL_IMAGES, MAX_IMAGES_PER_SET)
print("Loading and preprocessing synthetic images...")
if not os.path.isdir(PATH_SYNTHETIC_IMAGES):
print(f"Error: Synthetic image directory not found: {PATH_SYNTHETIC_IMAGES}")
print("Please replace 'path/to/synthetic/images' with the correct path.")
# Set synthetic_images to None or handle appropriately
synthetic_images = None
else:
synthetic_images = load_and_preprocess_images(PATH_SYNTHETIC_IMAGES, MAX_IMAGES_PER_SET)
# Check if images were loaded successfully before proceeding
if real_images is None or synthetic_images is None:
print("\nHalting execution due to missing image data. Please check paths and image files.")
# Exit or skip FID calculation if data is missing
# For a script, you might use: sys.exit(1) after importing sys
else:
print(f"Loaded {len(real_images)} real images and {len(synthetic_images)} synthetic images.")
print("Preprocessing complete.")
Make sure to replace path/to/real/images
and path/to/synthetic/images
with the actual paths to your image datasets. We also added a MAX_IMAGES_PER_SET
variable for demonstration purposes; for reliable FID scores, you should use a larger number of images (often thousands).
The next step is to feed these preprocessed images into the Inception v3 model (pre-trained on ImageNet) and extract activations from one of the deeper layers. The layer typically used is the final pooling layer before the classification output, as its features capture high-level image characteristics.
from scipy.linalg import sqrtm # For matrix square root
def calculate_activations(images, model):
"""Calculates activations for a batch of images using the model."""
if images is None or len(images) == 0:
return np.array([])
activations = model.predict(images)
return activations
def calculate_fid(act1, act2):
"""Calculates the FID score between two sets of activations."""
if act1.size == 0 or act2.size == 0:
print("Warning: One or both activation sets are empty. Cannot calculate FID.")
return float('inf') # Or handle as an error
# Calculate mean and covariance statistics
mu1, sigma1 = act1.mean(axis=0), np.cov(act1, rowvar=False)
mu2, sigma2 = act2.mean(axis=0), np.cov(act2, rowvar=False)
# Calculate sum squared difference between means
ssdiff = np.sum((mu1 - mu2)**2.0)
# Calculate sqrt of product of cov matrices
# Adding a small epsilon for numerical stability might be needed sometimes
eps = 1e-6
covmean, _ = sqrtm(sigma1.dot(sigma2), disp=False)
# Check and correct imaginary numbers from sqrtm
if np.iscomplexobj(covmean):
# print("Warning: Complex numbers generated in matrix square root. Taking real part.")
covmean = covmean.real
# Numerical stability check for covariance matrices
if not np.isfinite(sigma1).all() or not np.isfinite(sigma2).all():
print("Warning: Non-finite values found in covariance matrices. FID might be unstable.")
return float('inf') # Indicate instability
# Calculate score
fid = ssdiff + np.trace(sigma1 + sigma2 - 2.0 * covmean)
# Check for negative FID (can happen due to numerical instability)
if fid < 0:
# print(f"Warning: Negative FID calculated ({fid}). Clipping to 0.")
# Handle appropriately, e.g., log the issue or clip
fid = 0.0
return fid
# Proceed only if images were loaded successfully
if real_images is not None and synthetic_images is not None:
# Load InceptionV3 model pre-trained on ImageNet, excluding the top classification layer
# The output will be the features from the global average pooling layer
print("Loading InceptionV3 model...")
inception_model = tf.keras.applications.InceptionV3(
include_top=False,
pooling='avg', # Global Average Pooling layer output
input_shape=(IMAGE_SIZE[0], IMAGE_SIZE[1], 3),
weights='imagenet'
)
print("Model loaded.")
print("Calculating activations for real images...")
activations_real = calculate_activations(real_images, inception_model)
print("Calculating activations for synthetic images...")
activations_synthetic = calculate_activations(synthetic_images, inception_model)
if activations_real.size > 0 and activations_synthetic.size > 0:
print("Calculating FID score...")
# Ensure activations are 2D arrays for covariance calculation
if activations_real.ndim == 1: activations_real = activations_real.reshape(-1, 1)
if activations_synthetic.ndim == 1: activations_synthetic = activations_synthetic.reshape(-1, 1)
fid_score = calculate_fid(activations_real, activations_synthetic)
print(f"\nCalculated FID Score: {fid_score:.4f}")
else:
print("\nFID calculation skipped due to empty activation sets.")
This code loads the Inception v3 model, calculates the activations for both image sets using the predict
method, and then applies the FID formula using our calculate_fid
function. We use scipy.linalg.sqrtm
for the matrix square root computation, which is often the most numerically challenging part. We've included basic checks for empty activations and potential numerical issues like complex numbers or non-finite values.
The output fid_score
represents the Fréchet Inception Distance. Remember:
MAX_IMAGES_PER_SET=100
example) will lead to unreliable scores.This practical exercise demonstrates the core steps involved in calculating FID. By applying this metric, you gain a quantitative measure of how well your generative model captures the visual characteristics of the real data distribution, which is a significant aspect of evaluating synthetic image quality. Remember to adapt the paths and consider the practical limitations when applying this to your own projects.
© 2025 ApX Machine Learning