Training Generative Adversarial Networks directly on high-resolution images presents significant challenges. Large networks are difficult to optimize, gradients can vanish or explode, and the generator and discriminator may struggle to coordinate their learning process effectively, especially in the early stages when the generated images bear little resemblance to the target distribution. Generating fine details while simultaneously learning the coarse structure of an image from scratch is demanding.
Progressive Growing of GANs (ProGAN), introduced by Karras et al. (NVIDIA) in 2017, offers an elegant solution to this problem. Instead of training a single large network for the target high resolution from the beginning, ProGAN starts with very low-resolution images (e.g., 4x4 pixels) and incrementally adds layers to both the generator (G) and discriminator (D) to handle progressively higher resolutions (8x8, 16x16, ..., up to 1024x1024 or higher).
The fundamental principle is to first train the networks to understand the coarse structure of the image distribution at a low resolution. Once this initial stage converges reasonably well, new layers are added to both G and D to double the spatial resolution. The previously trained layers provide a stable foundation, and the new layers focus on learning the finer details specific to the increased resolution. This process repeats until the desired output resolution is reached.
A sudden introduction of new layers can shock the system and destabilize training. ProGAN addresses this by smoothly fading in the new layers. When transitioning from resolution R×R to 2R×2R, the new layers are added, but their influence is gradually increased using a parameter α that ramps up from 0 to 1 over many iterations.
Consider the generator:
A similar fading process occurs in the discriminator, but in reverse: input images at 2R×2R are processed by the new layers, while a downsampled version (R×R) bypasses them and goes into the older part of the network. The discriminator's decision is based on a convex combination (controlled by α) of the outputs from both paths.
Progressive growing phase transition. New layers (blue in G, red in D) are added to handle resolution 2R×2R. Their output is combined with the output from the previous R×R stage (upsampled in G, downsampled in D) using a parameter α that increases from 0 to 1, ensuring a smooth transition.
This gradual adaptation allows the network to incorporate the new capacity for detail without disrupting the already learned stable features from lower resolutions.
While progressive growing is the central idea, the success of ProGAN also relies on several other architectural choices and training techniques applied at each stage:
Progressive Growing demonstrated a powerful methodology for training GANs for high-resolution outputs. It highlighted the importance of curriculum learning principles (starting simple, gradually increasing complexity) in the context of generative models. While architectures like StyleGAN have built upon and refined these ideas, the core concept of progressive resolution increase introduced by ProGAN remains an important technique in the GAN practitioner's toolkit, showcasing how careful architectural design can overcome fundamental training hurdles.
© 2025 ApX Machine Learning