Generative Adversarial Networks (GANs), introduced by Ian Goodfellow and colleagues in 2014, represent a powerful framework for learning generative models. As outlined previously, GANs employ an adversarial process involving two neural networks: a Generator (G) and a Discriminator (D). Let's examine their roles and interactions more closely.
The Generator's task is to synthesize data that appears indistinguishable from real data. It takes a random noise vector z as input, typically sampled from a simple prior distribution pz(z) like a multivariate Gaussian or uniform distribution. This noise vector z resides in a lower-dimensional latent space. The Generator acts as a mapping function, transforming this latent vector into a high-dimensional data sample G(z) in the data space (e.g., an image).
G:Z→X
Here, Z is the latent space and X is the data space. The goal for G is to learn a distribution pg over X that matches the true data distribution pdata. Architecturally, for tasks like image generation, the Generator often uses layers like transposed convolutions (sometimes called deconvolutions) to upsample the low-dimensional input noise into a full-sized image. Its objective during training is purely to produce outputs G(z) that the Discriminator classifies as real.
The Discriminator acts as a binary classifier. Its input is a data sample x (which could be either a real sample from pdata or a generated sample G(z) from pg), and its output D(x) is a single scalar representing the probability that x came from the real data distribution pdata.
D:X→[0,1]
Ideally, D(x) should be close to 1 for real samples and close to 0 for generated samples. For image data, the Discriminator is commonly implemented as a standard Convolutional Neural Network (CNN) that outputs a probability. Its objective during training is to correctly distinguish between real and generated samples.
The training process pits G and D against each other in a zero-sum game. The core of this interaction is captured by the value function V(D,G) introduced earlier:
minGmaxDV(D,G)=Ex∼pdata(x)[logD(x)]+Ez∼pz(z)[log(1−D(G(z)))]
Let's break this down:
Maximizing D: The Discriminator D wants to maximize V(D,G). It does this by:
Minimizing G: The Generator G wants to minimize V(D,G) by making D perform poorly on generated samples. Since G only affects the second term, it tries to make D(G(z)) as close to 1 as possible (fooling the discriminator). This minimizes log(1−D(G(z))), which tends towards −∞ as D(G(z))→1.
This min-max formulation establishes an equilibrium point. Theoretically, if both G and D have sufficient capacity and the training process converges optimally, the generator's distribution pg will perfectly match the real data distribution pdata. At this point, the discriminator cannot distinguish real from generated samples better than chance, meaning D(x)=0.5 for all x. The value function V(D,G) converges to −log4.
Basic architecture of a Generative Adversarial Network showing the flow of random noise and real data through the Generator and Discriminator.
In practice, training G and D simultaneously using standard gradient descent is unstable. Instead, training alternates between updating D and G:
Update Discriminator: Sample a minibatch of noise vectors {z(1),...,z(m)} and a minibatch of real data examples {x(1),...,x(m)}. Update the parameters of D by ascending the stochastic gradient of V(D,G): ∇θdm1∑i=1m[logD(x(i))+log(1−D(G(z(i))))] This step might be repeated for k iterations to ensure D remains effective.
Update Generator: Sample a minibatch of noise vectors {z(1),...,z(m)}. Update the parameters of G by descending the stochastic gradient of V(D,G), specifically targeting the second term: ∇θgm1∑i=1mlog(1−D(G(z(i))))
A significant practical issue arises with the generator's loss function log(1−D(G(z))). When the discriminator becomes very effective early in training, it correctly assigns D(G(z))≈0 to generated samples. In this region, the gradient of log(1−x) with respect to x is very small (saturates), providing weak learning signals for the generator.
To counteract this saturation, a common modification is to change the generator's objective from minimizing Ez∼pz(z)[log(1−D(G(z)))] to maximizing Ez∼pz(z)[logD(G(z))]. This is often referred to as the "non-saturating" heuristic objective. While it doesn't represent the original min-max game exactly, it aims for the same goal (making D(G(z)) close to 1) but provides much stronger gradients early in training when D(G(z)) is small. In practice, this means updating G by ascending the gradient: ∇θgm1∑i=1mlogD(G(z(i)))
While the fundamental GAN framework is elegant, achieving stable training and high-quality results requires careful consideration of architecture choices, loss function variants, and optimization strategies. Difficulties like training instability (oscillations or divergence) and mode collapse (where the generator produces only a limited variety of samples) are common. These challenges motivate the advanced techniques and architectures explored in subsequent chapters.
© 2025 ApX Machine Learning