Having explored the architecture and training dynamics of GANs, particularly the Deep Convolutional GAN (DCGAN), let's translate that theory into practice. This section guides you through implementing a DCGAN from scratch using a standard deep learning framework like TensorFlow or PyTorch. Building and training a GAN is an excellent way to solidify your understanding of the interplay between the generator and discriminator, and to gain firsthand experience with the nuances of generative model training.
We'll focus on the core components: defining the generator and discriminator networks according to DCGAN principles, setting up the adversarial loss functions, and constructing the training loop that orchestrates the competition between the two networks.
First, ensure you have the necessary libraries installed. You'll typically need:
For this exercise, a common choice is a dataset like MNIST, Fashion-MNIST, or perhaps CelebA (potentially a cropped/resized subset for faster experimentation). These datasets provide readily available images suitable for learning the fundamentals of image generation.
Let's assume we are using TensorFlow with the Keras API and the Fashion-MNIST dataset. The initial step involves loading and preprocessing the data. Since the DCGAN generator typically uses a tanh
activation in its final layer, producing outputs in the range [−1,1], we need to normalize our real images accordingly.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import time
# Load the dataset (e.g., Fashion-MNIST)
(train_images, _), (_, _) = tf.keras.datasets.fashion_mnist.load_data()
# Preprocess the images
# Add channel dimension, resize if needed (DCGAN often works well with 64x64)
# Normalize images to [-1, 1]
BUFFER_SIZE = train_images.shape[0]
BATCH_SIZE = 256 # Adjust based on GPU memory
IMG_WIDTH = 28 # or 64 if resizing
IMG_HEIGHT = 28 # or 64 if resizing
CHANNELS = 1 # or 3 for color images
train_images = train_images.reshape(train_images.shape[0], IMG_WIDTH, IMG_HEIGHT, CHANNELS).astype('float32')
# Normalize the images to [-1, 1]
train_images = (train_images - 127.5) / 127.5
# Batch and shuffle the data
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
# Latent dimension for the noise vector
NOISE_DIM = 100
The generator's role is to transform a random noise vector (from the latent space, size NOISE_DIM
) into an image that resembles the real data. Following DCGAN guidelines, we use Conv2DTranspose
layers for upsampling, BatchNormalization
to stabilize training, and ReLU
(or LeakyReLU
) activations. The final layer uses tanh
.
A typical DCGAN generator starts by taking the noise vector and projecting it into a small spatial extent with many channels using a Dense
layer, followed by reshaping. Then, a sequence of Conv2DTranspose
layers progressively increases the spatial dimensions while decreasing the number of channels.
def make_generator_model(noise_dim, channels, target_height, target_width):
model = tf.keras.Sequential(name="generator")
# Start with Dense layer, project noise into suitable shape for Conv2DTranspose
# Example for 28x28 output: start with 7x7
# Example for 64x64 output: start with 4x4
start_h, start_w = target_height // 4, target_width // 4 # Assuming two 2x upsampling steps
model.add(tf.keras.layers.Dense(start_h * start_w * 256, use_bias=False, input_shape=(noise_dim,)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU(alpha=0.2)) # Using LeakyReLU
model.add(tf.keras.layers.Reshape((start_h, start_w, 256)))
assert model.output_shape == (None, start_h, start_w, 256) # Note: None is the batch size
# Upsample to target_height/2 x target_width/2
model.add(tf.keras.layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, target_height//2, target_width//2, 128)
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU(alpha=0.2))
# Upsample to target_height x target_width
model.add(tf.keras.layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, target_height, target_width, 64)
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU(alpha=0.2))
# Final layer to produce image with desired channels (e.g., 1 for MNIST/FashionMNIST, 3 for color)
model.add(tf.keras.layers.Conv2DTranspose(channels, (5, 5), strides=(1, 1), padding='same', use_bias=False, activation='tanh'))
assert model.output_shape == (None, target_height, target_width, channels)
return model
generator = make_generator_model(NOISE_DIM, CHANNELS, IMG_HEIGHT, IMG_WIDTH)
generator.summary() # Print model summary
Structure of the DCGAN Generator for a 28x28 output, transforming a 100-dimensional noise vector into an image.
The discriminator is a standard CNN binary classifier. It takes an image (either real or generated) as input and outputs a single probability indicating whether the input image is real (closer to 1) or fake (closer to 0). DCGAN suggests using Conv2D
layers with strided convolutions for downsampling (instead of pooling), LeakyReLU
activations, and potentially BatchNormalization
(though sometimes omitted, especially in the first layer or when using other regularization like gradient penalty, which is beyond basic DCGAN). The final layer is a Dense
layer with one output unit and a sigmoid
activation.
def make_discriminator_model(img_height, img_width, channels):
model = tf.keras.Sequential(name="discriminator")
input_shape = (img_height, img_width, channels)
# Downsample to 14x14 (for 28x28 input)
model.add(tf.keras.layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=input_shape))
model.add(tf.keras.layers.LeakyReLU(alpha=0.2))
model.add(tf.keras.layers.Dropout(0.3)) # Dropout for regularization
# Downsample to 7x7
model.add(tf.keras.layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
model.add(tf.keras.layers.LeakyReLU(alpha=0.2))
model.add(tf.keras.layers.Dropout(0.3))
# Flatten and classify
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1, activation='sigmoid')) # Sigmoid for probability output
return model
discriminator = make_discriminator_model(IMG_HEIGHT, IMG_WIDTH, CHANNELS)
discriminator.summary() # Print model summary
Structure of the DCGAN Discriminator for a 28x28 input, classifying the image as real or fake.
The core of GAN training lies in the adversarial loss. We use BinaryCrossentropy
for both networks.
We typically use separate Adam optimizers for the generator and discriminator. The DCGAN paper recommended a learning rate of 0.0002 and Beta1 of 0.5.
# Use Binary Cross Entropy loss
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=False) # Use from_logits=True if the last layer doesn't have sigmoid/tanh
# Discriminator loss
def discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss
return total_loss
# Generator loss
def generator_loss(fake_output):
# Generator wants discriminator to think fake images are real (label 1)
return cross_entropy(tf.ones_like(fake_output), fake_output)
# Optimizers (Adam is common for GANs)
generator_optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4, beta_1=0.5) # Example values
discriminator_optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4, beta_1=0.5) # Example values
The training loop requires careful orchestration. In each step, we train the discriminator and the generator separately.
This process is typically wrapped in a tf.function
for performance optimization in TensorFlow.
# We will reuse this seed overtime (so it's easier)
# to visualize progress in the animated GIF)
seed = tf.random.normal([16, NOISE_DIM]) # Fixed noise for consistent visualization
# Notice the use of `tf.function`
# This annotation causes the function to be "compiled".
@tf.function
def train_step(images, generator, discriminator, gen_optimizer, disc_optimizer, noise_dim):
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
# Calculate gradients
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
# Apply gradients
gen_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
disc_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
return gen_loss, disc_loss
# Training function
def train(dataset, epochs, generator, discriminator, gen_optimizer, disc_optimizer, noise_dim, seed):
history = {'gen_loss': [], 'disc_loss': []}
for epoch in range(epochs):
start = time.time()
epoch_gen_loss = []
epoch_disc_loss = []
for image_batch in dataset:
g_loss, d_loss = train_step(image_batch, generator, discriminator, gen_optimizer, disc_optimizer, noise_dim)
epoch_gen_loss.append(g_loss.numpy())
epoch_disc_loss.append(d_loss.numpy())
# Produce images for the GIF as we go
generate_and_save_images(generator, epoch + 1, seed) # Function defined below
avg_gen_loss = np.mean(epoch_gen_loss)
avg_disc_loss = np.mean(epoch_disc_loss)
history['gen_loss'].append(avg_gen_loss)
history['disc_loss'].append(avg_disc_loss)
print(f'Time for epoch {epoch + 1} is {time.time()-start:.2f} sec')
print(f'Generator Loss: {avg_gen_loss:.4f}, Discriminator Loss: {avg_disc_loss:.4f}')
# Generate after the final epoch
generate_and_save_images(generator, epochs, seed)
return history
# Helper function to generate and save images
def generate_and_save_images(model, epoch, test_input):
# Notice `training` is set to False.
# This is so all layers run in inference mode (batchnorm).
predictions = model(test_input, training=False)
fig = plt.figure(figsize=(4, 4))
for i in range(predictions.shape[0]):
plt.subplot(4, 4, i+1)
# Display grayscale or color based on channel count
if predictions.shape[-1] == 1:
plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='gray')
else:
plt.imshow(predictions[i, :, :, :] * 0.5 + 0.5) # Denormalize from [-1,1] to [0,1]
plt.axis('off')
# Save the figure or display it
# plt.savefig(f'image_at_epoch_{epoch:04d}.png')
plt.show()
# Start training
EPOCHS = 50 # Adjust as needed
history = train(train_dataset, EPOCHS, generator, discriminator, generator_optimizer, discriminator_optimizer, NOISE_DIM, seed)
Training GANs can be unstable. Monitoring the generator and discriminator losses is important. Ideally, the losses should reach some equilibrium, although they often fluctuate significantly. Neither loss converging to zero is usually a good sign; if the discriminator loss drops to zero, it means the generator isn't learning effectively. Conversely, if the generator loss drops too low too quickly, the discriminator might not be effective, potentially leading to mode collapse.
Visual inspection of the generated samples over epochs is arguably the most practical way to assess progress. Are the images becoming more realistic and diverse?
Example plot showing the fluctuation of Generator and Discriminator losses during training. An equilibrium or controlled oscillation is often sought, rather than convergence to zero.
This implementation provides a foundation for a DCGAN. Training GANs successfully often requires experimentation:
ReLU
vs. LeakyReLU
).Building and training this DCGAN provides invaluable hands-on experience with generative modeling, preparing you for exploring more advanced GAN variants like Conditional GANs or StyleGAN, as discussed earlier in the chapter. Remember that GAN training can be sensitive, so persistence and careful monitoring are necessary.
© 2025 ApX Machine Learning