Training Generative Adversarial Networks is notoriously challenging. Unlike standard supervised learning where you minimize a single loss function, GAN training involves a dynamic min-max game between two networks. This delicate balance can easily break down, leading to training instability. Recognizing the signs of trouble early is essential for applying the corrective measures discussed later in this chapter.
The most common symptoms of unstable GAN training are oscillations and outright divergence. Let's examine how to spot these issues.
The primary tool for diagnosing GAN training health is observing the loss curves for the generator (LG) and the discriminator (LD) over time (training iterations or epochs).
In an ideal scenario, both losses would gradually decrease and stabilize, indicating that the generator is improving and the discriminator is maintaining its ability to distinguish real from fake samples effectively. However, you might observe oscillatory behavior:
Losses Seesawing: The generator loss might decrease while the discriminator loss increases, only for the trend to reverse shortly after, repeating this cycle. This often suggests that the generator and discriminator are overpowering each other in turns, rather than converging smoothly. One network learns too quickly, changes the data distribution significantly, and the other network struggles to adapt, leading to large gradient updates that push the system back in the other direction.
High-Frequency Noise: While some noise in loss curves is normal due to minibatch stochasticity, extremely jagged or high-amplitude fluctuations can signal instability.
Example of oscillating loss curves, where LD and LG move in opposite directions cyclically.
While some oscillation might be acceptable if sample quality is improving, persistent and large oscillations often precede mode collapse or divergence.
Divergence is a more severe failure mode where the training process breaks down completely. Signs include:
Exploding Losses: One or both loss values rapidly increase towards infinity (or result in NaN
values). This typically happens when gradients become excessively large, causing parameter updates to overshoot wildly.
Vanishing Generator Loss: The generator loss LG drops close to zero and stays there. This might seem good, but it often means the generator has found a way to fool the current discriminator very easily (perhaps via mode collapse) but isn't actually learning the true data distribution. The discriminator might be stuck, unable to provide useful gradients.
Discriminator Loss Stuck at Zero: If LD goes to zero, it means the discriminator can perfectly distinguish real from fake samples. While this indicates a strong discriminator, it also means the generator receives no informative gradient signal (the gradient "vanishes"), halting its learning process.
Example where LD explodes while LG vanishes, indicating training divergence. Note the logarithmic y-axis.
Beyond the loss values, examine the discriminator's raw output (logits or probabilities) for real and fake samples. Let D(x) be the discriminator's output for a real sample x and D(G(z)) for a fake sample G(z).
Saturated Outputs: If D(x) consistently stays near 1 and D(G(z)) consistently stays near 0 (or vice versa, depending on the label convention), the discriminator is very confident. While this might happen early in training, if it persists, it often leads to vanishing gradients for the generator. The ideal scenario is for the discriminator to be uncertain, with outputs closer to 0.5, providing meaningful gradients to guide the generator.
Discriminator Accuracy: Monitoring the discriminator's accuracy on real vs. fake samples can also be informative. If accuracy rockets to 100% and stays there, the generator is likely not learning. Conversely, if accuracy is stuck at 50% (random guessing), the discriminator might be too weak or the generator might already be producing samples indistinguishable from real ones (less common early on).
Instability is often linked to problematic gradients flowing back through the networks.
Tools within deep learning frameworks allow you to monitor the norm (magnitude) of gradients during training. A sudden spike in gradient norms often precedes divergence, while consistently small norms might indicate a vanishing gradient problem. Techniques like gradient clipping (limiting the maximum gradient norm) or using alternative loss functions (like WGAN-GP, discussed later) are designed to mitigate these issues.
Quantitative metrics alone do not tell the whole story. Regularly inspecting the samples produced by the generator throughout training is non-negotiable.
Visual inspection provides invaluable qualitative feedback that complements the quantitative metrics derived from loss curves and discriminator outputs. If the losses look stable but the images are degrading or collapsing, it still signals a problem requiring intervention.
Recognizing these signs of instability is the first step. The following sections will detail specific techniques, including alternative loss functions and regularization methods, designed to prevent or fix these common GAN training pathologies.
© 2025 ApX Machine Learning