Assessing the performance of Generative Adversarial Networks presents unique difficulties compared to typical supervised learning tasks. Since GANs learn to approximate complex data distributions, simply looking at the loss function during training often fails to capture the true quality or diversity of the generated samples. Determining if a GAN produces realistic outputs and covers the variety present in the real data requires specialized evaluation techniques.
This chapter introduces methods for evaluating GANs, covering both qualitative and quantitative approaches. You will learn about the inherent challenges in this evaluation process. We will examine qualitative methods like visual inspection. More significantly, we will focus on quantitative metrics designed to capture different aspects of generation performance. You will study the formulation, interpretation, and limitations of widely used metrics such as the Inception Score (IS) and the Fréchet Inception Distance (FID). We will also cover metrics like Precision and Recall tailored for comparing distributions and the Perceptual Path Length (PPL) for assessing latent space properties. The chapter includes practical guidance on calculating and interpreting these scores to compare different models and track training progress.
5.1 Challenges in Evaluating Generative Models
5.2 Qualitative Assessment: Visual Turing Tests
5.3 Inception Score (IS): Formulation and Limitations
5.4 Fréchet Inception Distance (FID): Formulation
5.5 Interpreting FID Scores
5.6 Precision and Recall for Distributions
5.7 Perceptual Path Length (PPL)
5.8 Calculating FID Score: Practice
© 2025 ApX Machine Learning