Now that you're familiar with the principles behind Denoising Autoencoders (DAEs), let's get practical. In this section, we'll build and train a Denoising Autoencoder using PyTorch. Our goal is to take noisy images and teach the autoencoder to reconstruct the original, clean versions. This exercise will not only demonstrate the denoising capabilities of DAEs but also reinforce how they learn significant, resilient features by being forced to distinguish signal from noise. We'll use the popular MNIST dataset of handwritten digits for this task.
First, let's import the necessary libraries. We'll need PyTorch for building the neural network, NumPy for numerical operations (especially for adding noise), and Matplotlib for visualizing our results.
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import matplotlib.pyplot as plt
Make sure you have these libraries installed in your environment. If you followed the environment setup in Chapter 1, you should be ready to go.
The MNIST dataset contains 60,000 training images and 10,000 testing images of handwritten digits, each of size 28x28 pixels.
# Define a transform to normalize the data and flatten images
# Normalizing to [0, 1] for Sigmoid output
transform = transforms.Compose([
transforms.ToTensor(), # Converts to [0, 1] range automatically
transforms.Lambda(lambda x: x.view(-1)) # Flatten the 28x28 image to 784
])
# Load the MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
# Create data loaders (batch_size will be used later)
# We'll handle noise generation manually per batch in the training loop
# No need to use TensorDataset here, as torchvision datasets provide data and labels
train_loader_clean = DataLoader(train_dataset, batch_size=256, shuffle=True)
test_loader_clean = DataLoader(test_dataset, batch_size=256, shuffle=False)
# Get a sample to check shape
sample_data, _ = next(iter(train_loader_clean))
print(f"x_train_flat batch shape: {sample_data.shape}")
print(f"Flattened image size: {sample_data.shape[1]}")
We don't need the labels (_
) for training an autoencoder, so we ignore them. We normalize the pixel values to the [0, 1]
range, which is a common practice for neural networks and suitable for Sigmoid
output. We flatten each 28x28 image into a 784-dimensional vector.
The core idea of a Denoising Autoencoder is to learn to reconstruct clean data from a corrupted version. Let's add some noise to our MNIST images dynamically during training. Gaussian noise is a common choice.
# Function to add Gaussian noise
def add_gaussian_noise(images_tensor, noise_factor=0.4):
noisy_images = images_tensor + noise_factor * torch.randn_like(images_tensor)
return torch.clamp(noisy_images, 0., 1.) # Clip to valid range [0, 1]
# Let's visualize a few original and noisy digits from a sample batch
# Get one batch for demonstration
sample_batch_clean, _ = next(iter(train_loader_clean))
sample_batch_noisy = add_gaussian_noise(sample_batch_clean, noise_factor=0.4)
# Reshape flat images back to 28x28 for display
def display_images(original_flat, noisy_flat, n=10):
plt.figure(figsize=(20, 4))
for i in range(n):
# Display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(original_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
if i == 0:
ax.set_title("Original Images", loc='left')
# Display noisy
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(noisy_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
if i == 0:
ax.set_title("Noisy Images", loc='left')
plt.show()
# Display some sample images
display_images(sample_batch_clean, sample_batch_noisy)
Executing this display_images
function will show you a row of original digits and a corresponding row of their noisy versions. This helps in understanding the challenge we're setting for our autoencoder.
Our Denoising Autoencoder will be a fully-connected neural network. It consists of an encoder that maps the input to a lower-dimensional latent representation, and a decoder that reconstructs the input from this latent representation.
Let's define the architecture:
We'll define this as a standard PyTorch nn.Module
.
input_dim = 784 # Should be 784
encoding_dim = 64 # Size of the latent representation
class DenoisingAutoencoder(nn.Module):
def __init__(self, input_dim, encoding_dim):
super(DenoisingAutoencoder, self).__init__()
# Encoder
self.encoder = nn.Sequential(
nn.Linear(input_dim, 128),
nn.ReLU(),
nn.Linear(128, encoding_dim),
nn.ReLU() # Bottleneck layer
)
# Decoder
self.decoder = nn.Sequential(
nn.Linear(encoding_dim, 128),
nn.ReLU(),
nn.Linear(128, input_dim),
nn.Sigmoid() # Output activation for [0, 1] range
)
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
autoencoder = DenoisingAutoencoder(input_dim, encoding_dim)
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
autoencoder.to(device)
print(autoencoder)
The printout of autoencoder
shows the structure of the layers. This is useful for verifying the architecture.
Before training, we need to define the optimizer and the loss function.
torch.optim.Adam
is a good general-purpose optimizer.nn.MSELoss()
) is a suitable loss function. It measures the average squared difference between the reconstructed pixels and the original clean pixels.criterion = nn.MSELoss()
optimizer = optim.Adam(autoencoder.parameters(), lr=0.001)
Now, we train the autoencoder. The key difference from a standard autoencoder training is that the input will be the noisy images, but the target (what we want the autoencoder to reconstruct) will be the original, clean images. We generate the noise dynamically for each batch.
epochs = 20
noise_factor = 0.4 # Consistent noise factor for training
train_losses = []
val_losses = []
for epoch in range(epochs):
# Training
autoencoder.train()
running_train_loss = 0.0
for clean_images, _ in train_loader_clean:
clean_images = clean_images.to(device)
noisy_images = add_gaussian_noise(clean_images, noise_factor) # Add noise to current batch
optimizer.zero_grad()
outputs = autoencoder(noisy_images) # Input: noisy data
loss = criterion(outputs, clean_images) # Target: clean data
loss.backward()
optimizer.step()
running_train_loss += loss.item() * clean_images.size(0)
epoch_train_loss = running_train_loss / len(train_loader_clean.dataset)
train_losses.append(epoch_train_loss)
# Validation
autoencoder.eval()
running_val_loss = 0.0
with torch.no_grad():
for clean_images_val, _ in test_loader_clean:
clean_images_val = clean_images_val.to(device)
noisy_images_val = add_gaussian_noise(clean_images_val, noise_factor) # Add noise to validation batch
val_outputs = autoencoder(noisy_images_val)
val_loss = criterion(val_outputs, clean_images_val)
running_val_loss += val_loss.item() * clean_images_val.size(0)
epoch_val_loss = running_val_loss / len(test_loader_clean.dataset)
val_losses.append(epoch_val_loss)
print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {epoch_train_loss:.6f}, Val Loss: {epoch_val_loss:.6f}')
We can visualize the training and validation loss to check for overfitting or to see how well the model is learning.
plt.figure(figsize=(10, 5))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.title('Model Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Mean Squared Error Loss')
plt.legend()
plt.grid(True)
plt.show()
Training and validation loss curves for the Denoising Autoencoder. Ideally, both losses should decrease and converge.
The plot above shows example loss curves. This is generated from your train_losses
and val_losses
lists.
The true test of our Denoising Autoencoder is its ability to reconstruct clean images from noisy inputs it hasn't seen during training (the test set). Let's use our trained autoencoder
to predict (denoise) some x_test_noisy
images.
autoencoder.eval() # Set model to evaluation mode
with torch.no_grad():
# Get a batch of clean test images
test_batch_clean, _ = next(iter(test_loader_clean))
test_batch_clean = test_batch_clean.to(device)
# Create noisy versions of these images
test_batch_noisy = add_gaussian_noise(test_batch_clean, noise_factor)
# Get denoised outputs
denoised_images = autoencoder(test_batch_noisy).cpu().numpy() # Move to CPU and convert to NumPy
# Visualize original, noisy, and denoised images
def display_reconstructions(original_flat, noisy_flat, reconstructed_flat, n=10):
plt.figure(figsize=(20, 6))
for i in range(n):
# Display original
ax = plt.subplot(3, n, i + 1)
plt.imshow(original_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
if i == 0: ax.set_title("Original", loc='left')
# Display noisy input
ax = plt.subplot(3, n, i + 1 + n)
plt.imshow(noisy_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
if i == 0: ax.set_title("Noisy Input", loc='left')
# Display reconstruction
ax = plt.subplot(3, n, i + 1 + n * 2)
plt.imshow(reconstructed_flat[i].reshape(28, 28), cmap='gray')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
if i == 0: ax.set_title("Denoised Output", loc='left')
plt.show()
display_reconstructions(test_batch_clean, test_batch_noisy, denoised_images)
When you run display_reconstructions
, you should see three rows of images:
You should observe that the DAE has learned to remove a significant amount of noise, producing reconstructions that are much closer to the original images than the noisy inputs were. This demonstrates that the autoencoder has learned to capture the underlying structure of the digits.
While denoising is a useful application in itself, our primary interest in this course is feature extraction. The encoder part of our DAE has learned to transform the input images (784 dimensions) into a compressed representation (64 dimensions in our example). These latent representations can serve as features for downstream tasks.
To extract features, we simply use the encoder
attribute of our trained autoencoder
model:
autoencoder.eval() # Set model to evaluation mode
with torch.no_grad(): # Disable gradient calculations
# It's often more useful to extract features from the clean data for downstream tasks
# Ensure data is on the correct device
clean_test_images_tensor = test_loader_clean.dataset.tensors[0].to(device)
encoded_features_clean = autoencoder.encoder(clean_test_images_tensor).cpu().numpy()
# You could also extract features from noisy data if that's your use case
# noisy_test_images_tensor = add_gaussian_noise(clean_test_images_tensor, noise_factor)
# encoded_features_noisy = autoencoder.encoder(noisy_test_images_tensor).cpu().numpy()
print(f"Shape of extracted features from clean data: {encoded_features_clean.shape}")
# Shape of extracted features from clean data: (10000, 64)
The encoded_features_clean
(or encoded_features_noisy
if that's your need) now contain 64-dimensional feature vectors for each of the test images. These features are learned to be resilient to noise and capture essential information for reconstructing the digits. You could use these features as input to a classifier, for example. We will explore applications of extracted features more thoroughly in Chapter 7.
In this practice, you successfully implemented a Denoising Autoencoder using PyTorch. You saw how it can be trained to take noisy input and produce cleaner, reconstructed output. This process forces the model to learn more meaningful and resilient representations of the data. The encoder part of this DAE can then be used to extract these learned features, which are often more useful for other machine learning tasks than the raw input data, especially when dealing with noisy datasets.
You've now added another powerful tool to your autoencoder toolkit. Next, we will look into Convolutional Autoencoders, which are particularly well-suited for image data.
Was this section helpful?
© 2025 ApX Machine Learning