Generating synthetic data is only half the battle. Once you have trained a GAN or a diffusion model, how do you determine if the generated samples are actually good? Simply looking at a few examples can be misleading. This chapter focuses on the methods needed to rigorously assess the quality, diversity, and fidelity of synthetic data produced by generative models.
You will learn about the inherent difficulties in evaluating generative models, where simple metrics like accuracy don't apply. We will cover established quantitative metrics such as the Inception Score (IS) and Fréchet Inception Distance (FID), understanding their calculations and interpretations. You will also examine distributional metrics like the Kernel Inception Distance (KID) and specific metrics for GANs, such as Perceptual Path Length (PPL), which measures latent space quality.
Beyond automated metrics, we will discuss qualitative assessment techniques and specific considerations for evaluating models that generate data based on conditions (e.g., class labels). Finally, you'll get hands-on experience implementing code to calculate FID, a standard metric used widely in the field. By the end of this chapter, you will have a toolkit for evaluating the outputs of your generative models effectively.
5.1 Challenges in Generative Model Evaluation
5.2 Quantitative Metrics: IS, FID, Precision, Recall
5.3 Distributional Metrics: Kernel Inception Distance (KID)
5.4 Perceptual Path Length (PPL) for GANs
5.5 Qualitative Evaluation Methods
5.6 Evaluating Conditional Generation Models
5.7 Hands-on Practical: Calculating FID Scores
© 2025 ApX Machine Learning