All Courses

Hands-on: Implementing a Denoising Autoencoder

Now that you're familiar with the principles behind Denoising Autoencoders (DAEs), let's get practical. In this section, we'll build and train a Denoising Autoencoder using PyTorch. Our goal is to take noisy images and teach the autoencoder to reconstruct the original, clean versions. This exercise will not only demonstrate the denoising capabilities of DAEs but also reinforce how they learn significant, resilient features by being forced to distinguish signal from noise. We'll use the popular MNIST dataset of handwritten digits for this task.

1. Setting Up and Importing Libraries

First, let's import the necessary libraries. We'll need PyTorch for building the neural network, NumPy for numerical operations (especially for adding noise), and Matplotlib for visualizing our results.

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import matplotlib.pyplot as plt

Make sure you have these libraries installed in your environment. If you followed the environment setup in Chapter 1, you should be ready to go.

2. Loading and Preparing the MNIST Dataset

The MNIST dataset contains 60,000 training images and 10,000 testing images of handwritten digits, each of size 28x28 pixels.

# Define a transform to normalize the data and flatten images
# Normalizing to [0, 1] for Sigmoid output
transform = transforms.Compose([
    transforms.ToTensor(), # Converts to [0, 1] range automatically
    transforms.Lambda(lambda x: x.view(-1)) # Flatten the 28x28 image to 784
])

# Load the MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create data loaders (batch_size will be used later)
# We'll handle noise generation manually per batch in the training loop
# No need to use TensorDataset here, as torchvision datasets provide data and labels
train_loader_clean = DataLoader(train_dataset, batch_size=256, shuffle=True)
test_loader_clean = DataLoader(test_dataset, batch_size=256, shuffle=False)

# Get a sample to check shape
sample_data, _ = next(iter(train_loader_clean))
print(f"x_train_flat batch shape: {sample_data.shape}")
print(f"Flattened image size: {sample_data.shape[1]}")

We don't need the labels (_) for training an autoencoder, so we ignore them. We normalize the pixel values to the [0, 1] range, which is a common practice for neural networks and suitable for Sigmoid output. We flatten each 28x28 image into a 784-dimensional vector.

3. Introducing Noise to the Data

The core idea of a Denoising Autoencoder is to learn to reconstruct clean data from a corrupted version. Let's add some noise to our MNIST images dynamically during training. Gaussian noise is a common choice.

# Function to add Gaussian noise
def add_gaussian_noise(images_tensor, noise_factor=0.4):
    noisy_images = images_tensor + noise_factor * torch.randn_like(images_tensor)
    return torch.clamp(noisy_images, 0., 1.) # Clip to valid range [0, 1]

# Let's visualize a few original and noisy digits from a sample batch
# Get one batch for demonstration
sample_batch_clean, _ = next(iter(train_loader_clean))
sample_batch_noisy = add_gaussian_noise(sample_batch_clean, noise_factor=0.4)

# Reshape flat images back to 28x28 for display
def display_images(original_flat, noisy_flat, n=10):
    plt.figure(figsize=(20, 4))
    for i in range(n):
        # Display original
        ax = plt.subplot(2, n, i + 1)
        plt.imshow(original_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray')
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
        if i == 0:
            ax.set_title("Original Images", loc='left')

        # Display noisy
        ax = plt.subplot(2, n, i + 1 + n)
        plt.imshow(noisy_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray')
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
        if i == 0:
            ax.set_title("Noisy Images", loc='left')
    plt.show()

# Display some sample images
display_images(sample_batch_clean, sample_batch_noisy)

Executing this display_images function will show you a row of original digits and a corresponding row of their noisy versions. This helps in understanding the challenge we're setting for our autoencoder.

4. Building the Denoising Autoencoder Model

Our Denoising Autoencoder will be a fully-connected neural network. It consists of an encoder that maps the input to a lower-dimensional latent representation, and a decoder that reconstructs the input from this latent representation.

Let's define the architecture:

Input layer: 784 units (for flattened 28x28 images).
Encoder:
- A linear layer with 128 units and ReLU activation.
- The bottleneck layer: A linear layer with 64 units and ReLU activation. This is our latent representation.
Decoder:
- A linear layer with 128 units and ReLU activation.
- Output layer: A linear layer with 784 units and Sigmoid activation (to output pixel values between 0 and 1).

We'll define this as a standard PyTorch nn.Module.

input_dim = 784 # Should be 784
encoding_dim = 64 # Size of the latent representation

class DenoisingAutoencoder(nn.Module):
    def __init__(self, input_dim, encoding_dim):
        super(DenoisingAutoencoder, self).__init__()
        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128, encoding_dim),
            nn.ReLU() # Bottleneck layer
        )
        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(encoding_dim, 128),
            nn.ReLU(),
            nn.Linear(128, input_dim),
            nn.Sigmoid() # Output activation for [0, 1] range
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

autoencoder = DenoisingAutoencoder(input_dim, encoding_dim)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
autoencoder.to(device)

print(autoencoder)

The printout of autoencoder shows the structure of the layers. This is useful for verifying the architecture.

5. Compiling the Model (Defining Loss and Optimizer)

Before training, we need to define the optimizer and the loss function.

Optimizer: torch.optim.Adam is a good general-purpose optimizer.
Loss Function: Since we are comparing pixel values (which are continuous between 0 and 1), Mean Squared Error (nn.MSELoss()) is a suitable loss function. It measures the average squared difference between the reconstructed pixels and the original clean pixels.

criterion = nn.MSELoss()
optimizer = optim.Adam(autoencoder.parameters(), lr=0.001)

6. Training the Denoising Autoencoder

Now, we train the autoencoder. The key difference from a standard autoencoder training is that the input will be the noisy images, but the target (what we want the autoencoder to reconstruct) will be the original, clean images. We generate the noise dynamically for each batch.

epochs = 20
noise_factor = 0.4 # Consistent noise factor for training

train_losses = []
val_losses = []

for epoch in range(epochs):
    # Training
    autoencoder.train()
    running_train_loss = 0.0
    for clean_images, _ in train_loader_clean:
        clean_images = clean_images.to(device)
        noisy_images = add_gaussian_noise(clean_images, noise_factor) # Add noise to current batch

        optimizer.zero_grad()
        outputs = autoencoder(noisy_images) # Input: noisy data
        loss = criterion(outputs, clean_images) # Target: clean data
        loss.backward()
        optimizer.step()
        running_train_loss += loss.item() * clean_images.size(0)
    
    epoch_train_loss = running_train_loss / len(train_loader_clean.dataset)
    train_losses.append(epoch_train_loss)

    # Validation
    autoencoder.eval()
    running_val_loss = 0.0
    with torch.no_grad():
        for clean_images_val, _ in test_loader_clean:
            clean_images_val = clean_images_val.to(device)
            noisy_images_val = add_gaussian_noise(clean_images_val, noise_factor) # Add noise to validation batch

            val_outputs = autoencoder(noisy_images_val)
            val_loss = criterion(val_outputs, clean_images_val)
            running_val_loss += val_loss.item() * clean_images_val.size(0)
    
    epoch_val_loss = running_val_loss / len(test_loader_clean.dataset)
    val_losses.append(epoch_val_loss)

    print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {epoch_train_loss:.6f}, Val Loss: {epoch_val_loss:.6f}')

We can visualize the training and validation loss to check for overfitting or to see how well the model is learning.

plt.figure(figsize=(10, 5))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.title('Model Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Mean Squared Error Loss')
plt.legend()
plt.grid(True)
plt.show()

Training and validation loss curves for the Denoising Autoencoder. Ideally, both losses should decrease and converge.

The plot above shows example loss curves. This is generated from your train_losses and val_losses lists.

7. Evaluating the Model: Visualizing Denoised Images

The true test of our Denoising Autoencoder is its ability to reconstruct clean images from noisy inputs it hasn't seen during training (the test set). Let's use our trained autoencoder to predict (denoise) some x_test_noisy images.

autoencoder.eval() # Set model to evaluation mode
with torch.no_grad():
    # Get a batch of clean test images
    test_batch_clean, _ = next(iter(test_loader_clean))
    test_batch_clean = test_batch_clean.to(device)

    # Create noisy versions of these images
    test_batch_noisy = add_gaussian_noise(test_batch_clean, noise_factor)

    # Get denoised outputs
    denoised_images = autoencoder(test_batch_noisy).cpu().numpy() # Move to CPU and convert to NumPy

# Visualize original, noisy, and denoised images
def display_reconstructions(original_flat, noisy_flat, reconstructed_flat, n=10):
    plt.figure(figsize=(20, 6))
    for i in range(n):
        # Display original
        ax = plt.subplot(3, n, i + 1)
        plt.imshow(original_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray')
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
        if i == 0: ax.set_title("Original", loc='left')

        # Display noisy input
        ax = plt.subplot(3, n, i + 1 + n)
        plt.imshow(noisy_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray')
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
        if i == 0: ax.set_title("Noisy Input", loc='left')

        # Display reconstruction
        ax = plt.subplot(3, n, i + 1 + n * 2)
        plt.imshow(reconstructed_flat[i].reshape(28, 28), cmap='gray')
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
        if i == 0: ax.set_title("Denoised Output", loc='left')
    plt.show()

display_reconstructions(test_batch_clean, test_batch_noisy, denoised_images)

When you run display_reconstructions, you should see three rows of images:

The original, clean test images.
The noisy versions of these test images (which were fed into the DAE).
The output of the DAE (the reconstructed, hopefully cleaner images).

You should observe that the DAE has learned to remove a significant amount of noise, producing reconstructions that are much closer to the original images than the noisy inputs were. This demonstrates that the autoencoder has learned to capture the underlying structure of the digits.

8. Using the Encoder for Feature Extraction

While denoising is a useful application in itself, our primary interest in this course is feature extraction. The encoder part of our DAE has learned to transform the input images (784 dimensions) into a compressed representation (64 dimensions in our example). These latent representations can serve as features for downstream tasks.

To extract features, we simply use the encoder attribute of our trained autoencoder model:

autoencoder.eval() # Set model to evaluation mode
with torch.no_grad(): # Disable gradient calculations
    # It's often more useful to extract features from the clean data for downstream tasks
    # Ensure data is on the correct device
    clean_test_images_tensor = test_loader_clean.dataset.tensors[0].to(device)
    
    encoded_features_clean = autoencoder.encoder(clean_test_images_tensor).cpu().numpy()

    # You could also extract features from noisy data if that's your use case
    # noisy_test_images_tensor = add_gaussian_noise(clean_test_images_tensor, noise_factor)
    # encoded_features_noisy = autoencoder.encoder(noisy_test_images_tensor).cpu().numpy()

print(f"Shape of extracted features from clean data: {encoded_features_clean.shape}")
# Shape of extracted features from clean data: (10000, 64)

The encoded_features_clean (or encoded_features_noisy if that's your need) now contain 64-dimensional feature vectors for each of the test images. These features are learned to be resilient to noise and capture essential information for reconstructing the digits. You could use these features as input to a classifier, for example. We will explore applications of extracted features more thoroughly in Chapter 7.

Concluding Remarks

In this practice, you successfully implemented a Denoising Autoencoder using PyTorch. You saw how it can be trained to take noisy input and produce cleaner, reconstructed output. This process forces the model to learn more meaningful and resilient representations of the data. The encoder part of this DAE can then be used to extract these learned features, which are often more useful for other machine learning tasks than the raw input data, especially when dealing with noisy datasets.

You've now added another powerful tool to your autoencoder toolkit. Next, we will look into Convolutional Autoencoders, which are particularly well-suited for image data.

Was this section helpful?