All Courses

Implementing Denoising Autoencoders: Hands-on Practical

Building on our understanding of Denoising Autoencoders (DAEs), let's translate the theory into practice. Recall that DAEs are trained not just to reconstruct their input, but to reconstruct a clean version of a corrupted input. This process forces the encoder to capture more features, learning the underlying data structure rather than simply memorizing the training examples or becoming sensitive to minor input variations.

In this hands-on section, we'll implement a DAE using TensorFlow and Keras to denoise images from the popular MNIST dataset. We assume you have a working Python environment with TensorFlow installed.

1. Setup and Data Loading

First, we import the necessary libraries and load the MNIST dataset. We'll normalize the pixel values to be between 0 and 1. For this example, we'll use a simple feedforward neural network, so we'll also flatten the images.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load the MNIST dataset
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()

# Normalize pixel values to [0, 1] and flatten images
image_size = 28 * 28
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), image_size))
x_test = x_test.reshape((len(x_test), image_size))

print(f"x_train shape: {x_train.shape}")
print(f"x_test shape: {x_test.shape}")

2. Introducing Noise

The core idea of a DAE is training on noisy data. Let's create noisy versions of our MNIST images by adding Gaussian noise. The amount of noise is a hyperparameter you can tune; here, we'll use a moderate level.

# Define noise factor
noise_factor = 0.4 # Adjust this value to control noise level

# Add Gaussian noise to training and test data
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)

# Clip values to be between 0 and 1
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

# Visualize some noisy images
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    # Display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.title("Original")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display noisy version
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(x_test_noisy[i].reshape(28, 28))
    plt.title("Noisy")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

You should see the original digits on the top row and their corresponding noisy versions below. Our DAE will learn to map the bottom row images back to the top row images.

3. Building the Denoising Autoencoder Model

We'll construct a simple DAE with fully connected (Dense) layers. The encoder will compress the 784-pixel input into a smaller latent dimension (e.g., 64), and the decoder will attempt to reconstruct the original 784 pixels from this latent representation.

# Latent dimension
latent_dim = 64

# Input layer
input_img = keras.Input(shape=(image_size,))

# Encoder
encoded = layers.Dense(256, activation='relu')(input_img)
encoded = layers.Dense(128, activation='relu')(encoded)
encoded = layers.Dense(latent_dim, activation='relu', name='encoder_output')(encoded) # Bottleneck

# Decoder
decoded = layers.Dense(128, activation='relu')(encoded)
decoded = layers.Dense(256, activation='relu')(decoded)
decoded = layers.Dense(image_size, activation='sigmoid')(decoded) # Output layer matches input shape

# Define the autoencoder model
autoencoder = keras.Model(input_img, decoded, name='denoising_autoencoder')

# Optionally, define the encoder model separately if needed
encoder = keras.Model(input_img, encoded, name='encoder')

# Display model summary
autoencoder.summary()

Here's a simplified view of the architecture:

A simple Denoising Autoencoder structure with an encoder mapping the input to a lower-dimensional latent space and a decoder reconstructing the input from the latent space.

4. Compiling and Training the Model

We compile the model using the Adam optimizer and binary cross-entropy loss. Binary cross-entropy is suitable here because the pixel values are normalized between 0 and 1 and can be treated as probabilities. The main step for a DAE is specifying the x (input) and y (target) for training:

Input (x): x_train_noisy
Target (y): x_train (the original, clean images)

# Compile the model
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

# Train the model
epochs = 50
batch_size = 128

history = autoencoder.fit(x_train_noisy, x_train, # Use noisy input, clean target
                          epochs=epochs,
                          batch_size=batch_size,
                          shuffle=True,
                          validation_data=(x_test_noisy, x_test)) # Validate on noisy test -> clean test

5. Evaluating the Denoising Performance

After training, we can use the trained autoencoder to predict (reconstruct) clean images from the noisy test set. Let's visualize the results.

# Plot training & validation loss values
plt.figure(figsize=(10, 5))
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model loss')
plt.ylabel('Loss (Binary Crossentropy)')
plt.xlabel('Epoch')
plt.legend(loc='upper right')
plt.grid(True)
plt.show()

# Use the trained autoencoder to denoise the test images
decoded_imgs = autoencoder.predict(x_test_noisy)

# Visualize original, noisy, and denoised images
n = 10
plt.figure(figsize=(20, 6))
for i in range(n):
    # Display original image
    ax = plt.subplot(3, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.title("Original")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display noisy image
    ax = plt.subplot(3, n, i + 1 + n)
    plt.imshow(x_test_noisy[i].reshape(28, 28))
    plt.title("Noisy")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstructed (denoised) image
    ax = plt.subplot(3, n, i + 1 + 2 * n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.title("Denoised")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.tight_layout()
plt.show()

You should observe that the reconstructed images in the bottom row are significantly cleaner than the noisy inputs in the middle row, closely resembling the original images in the top row. This demonstrates the DAE's ability to learn representations that capture the essence of the digits while ignoring the noise.

Example plot showing the decrease in training and validation loss over epochs for the Denoising Autoencoder. Both losses decrease and converge, indicating successful training.

Discussion and Further Steps

This implementation provides a basic framework for a Denoising Autoencoder. Consider these points for further exploration:

Noise Type and Level: Experiment with different types of noise (e.g., salt-and-pepper, dropout) and varying noise factors. How does the DAE's performance change?
Architecture: For image data, Convolutional Autoencoders (using Conv2D and Conv2DTranspose layers) often perform significantly better as they respect the spatial structure of images. Try adapting this code to use convolutional layers.
Latent Dimension: Vary the size of the latent_dim. A smaller dimension forces more compression but might lose information, while a larger dimension might be less effective at regularization.
Loss Function: If pixel values were normalized differently (e.g., [-1, 1] or mean-centered), Mean Squared Error (mse) might be a more appropriate loss function than binary_crossentropy.

By working through this example, you've gained practical experience in implementing a Denoising Autoencoder, a valuable technique for learning features that are strong to noise and variations in the input data. This strength is often a desired property for downstream tasks like classification or clustering performed on the learned representations.

Was this section helpful?