While metrics like Inception Score (IS) and Fréchet Inception Distance (FID) provide valuable information by comparing features extracted from pre-trained networks, they have limitations. FID, for instance, models the extracted features (typically from an Inception network) as multivariate Gaussian distributions and compares their means and covariances. This assumption might not always hold, and FID estimates can be biased, especially when calculated on small sets of samples.
To address these issues, the Kernel Inception Distance (KID) offers an alternative approach for comparing the distributions of real and generated data features. Instead of assuming Gaussianity, KID uses the Maximum Mean Discrepancy (MMD), a non-parametric statistical test used to determine if two sets of samples originate from the same distribution.
Maximum Mean Discrepancy (MMD)
At its core, MMD measures the distance between the mean embeddings of two distributions in a high-dimensional feature space known as a Reproducing Kernel Hilbert Space (RKHS). The intuition is that if two distributions are identical, their mean representations in this space will also be identical. The further apart the mean embeddings, the larger the MMD, indicating greater dissimilarity between the distributions.
The squared MMD between two distributions, Pr (real) and Pg (generated), using a kernel function k, is defined as:
MMD2(Pr,Pg)=Ex,x′∼Pr[k(x,x′)]−2Ex∼Pr,y∼Pg[k(x,y)]+Ey,y′∼Pg[k(y,y′)]
Here, x and x′ are samples from the real distribution, y and y′ are samples from the generated distribution, and k(a,b) is the kernel function evaluating the similarity between samples a and b.
Calculating KID
KID applies the MMD calculation specifically to the features extracted from an Inception network, similar to how FID operates. The typical steps are:
- Feature Extraction: Process batches of real images (X) and generated images (Y) through a pre-trained Inception network (usually up to a specific layer like the final average pooling layer) to obtain feature vectors for each image.
- MMD Estimation: Compute an empirical estimate of the squared MMD using the extracted feature vectors. For batches X={x1,...,xn} and Y={y1,...,ym}, the unbiased empirical estimate is often used.
- Kernel Choice: KID commonly employs a polynomial kernel, such as k(a,b)=(d1aTb+1)3, where d is the dimensionality of the feature vectors. This kernel considers higher-order statistics used by FID.
- Averaging: The computation is often repeated over multiple random subsets (splits) of the data, and the results are averaged to obtain a more stable KID estimate.
The final KID value is typically reported as the squared MMD estimate, sometimes multiplied by a scaling factor (e.g., 100). As with FID, lower KID values indicate that the distribution of generated image features is closer to that of real image features, suggesting higher quality and diversity.
Advantages of KID
- Unbiased Estimator: MMD provides an unbiased estimate of the distributional distance, which is particularly advantageous when working with smaller sample sizes where FID estimates can be noisy and biased.
- No Gaussian Assumption: KID does not assume that the Inception features follow a Gaussian distribution, making it potentially more reliable when this assumption is violated.
- Sensitivity: It can capture differences in higher-order statistics due to the use of kernels like the polynomial kernel.
Practical Notes
- Computational Cost: Calculating KID can be more computationally intensive than FID, especially the MMD part which involves pairwise kernel evaluations (naively O(n2) complexity for n samples, although linear-time approximations exist).
- Implementation: Requires careful implementation of the MMD estimator and access to Inception features. Standard implementations are available in libraries like
torch_fidelity or can be built using scientific computing libraries.
- Interpretation: Like FID, KID is a relative metric. Its absolute value is less informative than its comparison across different models or training checkpoints. Comparing KID values obtained using different feature extractors or kernels is generally not meaningful.
In summary, KID serves as a powerful distributional metric for evaluating generative models, offering robustness, particularly with limited data, and avoiding the Gaussian assumptions inherent in FID. It provides a complementary perspective on the similarity between the distributions of real and generated data features.