Now that you're familiar with the principles behind Denoising Autoencoders (DAEs), let's get practical. In this section, we'll build and train a Denoising Autoencoder using PyTorch. Our goal is to take noisy images and teach the autoencoder to reconstruct the original, clean versions. This exercise will not only demonstrate the denoising capabilities of DAEs but also reinforce how they learn significant, resilient features by being forced to distinguish signal from noise. We'll use the popular MNIST dataset of handwritten digits for this task.1. Setting Up and Importing LibrariesFirst, let's import the necessary libraries. We'll need PyTorch for building the neural network, NumPy for numerical operations (especially for adding noise), and Matplotlib for visualizing our results.import torch import torch.nn as nn import torch.optim as optim from torchvision import datasets, transforms from torch.utils.data import DataLoader, TensorDataset import numpy as np import matplotlib.pyplot as pltMake sure you have these libraries installed in your environment. If you followed the environment setup in Chapter 1, you should be ready to go.2. Loading and Preparing the MNIST DatasetThe MNIST dataset contains 60,000 training images and 10,000 testing images of handwritten digits, each of size 28x28 pixels.# Define a transform to normalize the data and flatten images # Normalizing to [0, 1] for Sigmoid output transform = transforms.Compose([ transforms.ToTensor(), # Converts to [0, 1] range automatically transforms.Lambda(lambda x: x.view(-1)) # Flatten the 28x28 image to 784 ]) # Load the MNIST dataset train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform) test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform) # Create data loaders (batch_size will be used later) # We'll handle noise generation manually per batch in the training loop # No need to use TensorDataset here, as torchvision datasets provide data and labels train_loader_clean = DataLoader(train_dataset, batch_size=256, shuffle=True) test_loader_clean = DataLoader(test_dataset, batch_size=256, shuffle=False) # Get a sample to check shape sample_data, _ = next(iter(train_loader_clean)) print(f"x_train_flat batch shape: {sample_data.shape}") print(f"Flattened image size: {sample_data.shape[1]}")We don't need the labels (_) for training an autoencoder, so we ignore them. We normalize the pixel values to the [0, 1] range, which is a common practice for neural networks and suitable for Sigmoid output. We flatten each 28x28 image into a 784-dimensional vector.3. Introducing Noise to the DataThe core idea of a Denoising Autoencoder is to learn to reconstruct clean data from a corrupted version. Let's add some noise to our MNIST images dynamically during training. Gaussian noise is a common choice.# Function to add Gaussian noise def add_gaussian_noise(images_tensor, noise_factor=0.4): noisy_images = images_tensor + noise_factor * torch.randn_like(images_tensor) return torch.clamp(noisy_images, 0., 1.) # Clip to valid range [0, 1] # Let's visualize a few original and noisy digits from a sample batch # Get one batch for demonstration sample_batch_clean, _ = next(iter(train_loader_clean)) sample_batch_noisy = add_gaussian_noise(sample_batch_clean, noise_factor=0.4) # Reshape flat images back to 28x28 for display def display_images(original_flat, noisy_flat, n=10): plt.figure(figsize=(20, 4)) for i in range(n): # Display original ax = plt.subplot(2, n, i + 1) plt.imshow(original_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray') ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) if i == 0: ax.set_title("Original Images", loc='left') # Display noisy ax = plt.subplot(2, n, i + 1 + n) plt.imshow(noisy_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray') ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) if i == 0: ax.set_title("Noisy Images", loc='left') plt.show() # Display some sample images display_images(sample_batch_clean, sample_batch_noisy)Executing this display_images function will show you a row of original digits and a corresponding row of their noisy versions. This helps in understanding the challenge we're setting for our autoencoder.4. Building the Denoising Autoencoder ModelOur Denoising Autoencoder will be a fully-connected neural network. It consists of an encoder that maps the input to a lower-dimensional latent representation, and a decoder that reconstructs the input from this latent representation.Let's define the architecture:Input layer: 784 units (for flattened 28x28 images).Encoder:A linear layer with 128 units and ReLU activation.The bottleneck layer: A linear layer with 64 units and ReLU activation. This is our latent representation.Decoder:A linear layer with 128 units and ReLU activation.Output layer: A linear layer with 784 units and Sigmoid activation (to output pixel values between 0 and 1).We'll define this as a standard PyTorch nn.Module.input_dim = 784 # Should be 784 encoding_dim = 64 # Size of the latent representation class DenoisingAutoencoder(nn.Module): def __init__(self, input_dim, encoding_dim): super(DenoisingAutoencoder, self).__init__() # Encoder self.encoder = nn.Sequential( nn.Linear(input_dim, 128), nn.ReLU(), nn.Linear(128, encoding_dim), nn.ReLU() # Bottleneck layer ) # Decoder self.decoder = nn.Sequential( nn.Linear(encoding_dim, 128), nn.ReLU(), nn.Linear(128, input_dim), nn.Sigmoid() # Output activation for [0, 1] range ) def forward(self, x): encoded = self.encoder(x) decoded = self.decoder(encoded) return decoded autoencoder = DenoisingAutoencoder(input_dim, encoding_dim) # Move model to GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") autoencoder.to(device) print(autoencoder)The printout of autoencoder shows the structure of the layers. This is useful for verifying the architecture.5. Compiling the Model (Defining Loss and Optimizer)Before training, we need to define the optimizer and the loss function.Optimizer: torch.optim.Adam is a good general-purpose optimizer.Loss Function: Since we are comparing pixel values (which are continuous between 0 and 1), Mean Squared Error (nn.MSELoss()) is a suitable loss function. It measures the average squared difference between the reconstructed pixels and the original clean pixels.criterion = nn.MSELoss() optimizer = optim.Adam(autoencoder.parameters(), lr=0.001)6. Training the Denoising AutoencoderNow, we train the autoencoder. The main difference from a standard autoencoder training is that the input will be the noisy images, but the target (what we want the autoencoder to reconstruct) will be the original, clean images. We generate the noise dynamically for each batch.epochs = 20 noise_factor = 0.4 # Consistent noise factor for training train_losses = [] val_losses = [] for epoch in range(epochs): # Training autoencoder.train() running_train_loss = 0.0 for clean_images, _ in train_loader_clean: clean_images = clean_images.to(device) noisy_images = add_gaussian_noise(clean_images, noise_factor) # Add noise to current batch optimizer.zero_grad() outputs = autoencoder(noisy_images) # Input: noisy data loss = criterion(outputs, clean_images) # Target: clean data loss.backward() optimizer.step() running_train_loss += loss.item() * clean_images.size(0) epoch_train_loss = running_train_loss / len(train_loader_clean.dataset) train_losses.append(epoch_train_loss) # Validation autoencoder.eval() running_val_loss = 0.0 with torch.no_grad(): for clean_images_val, _ in test_loader_clean: clean_images_val = clean_images_val.to(device) noisy_images_val = add_gaussian_noise(clean_images_val, noise_factor) # Add noise to validation batch val_outputs = autoencoder(noisy_images_val) val_loss = criterion(val_outputs, clean_images_val) running_val_loss += val_loss.item() * clean_images_val.size(0) epoch_val_loss = running_val_loss / len(test_loader_clean.dataset) val_losses.append(epoch_val_loss) print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {epoch_train_loss:.6f}, Val Loss: {epoch_val_loss:.6f}')We can visualize the training and validation loss to check for overfitting or to see how well the model is learning.plt.figure(figsize=(10, 5)) plt.plot(train_losses, label='Training Loss') plt.plot(val_losses, label='Validation Loss') plt.title('Model Training and Validation Loss') plt.xlabel('Epoch') plt.ylabel('Mean Squared Error Loss') plt.legend() plt.grid(True) plt.show()Training and validation loss curves for the Denoising Autoencoder. Ideally, both losses should decrease and converge.The plot above shows example loss curves. This is generated from your train_losses and val_losses lists.7. Evaluating the Model: Visualizing Denoised ImagesThe true test of our Denoising Autoencoder is its ability to reconstruct clean images from noisy inputs it hasn't seen during training (the test set). Let's use our trained autoencoder to predict (denoise) some x_test_noisy images.autoencoder.eval() # Set model to evaluation mode with torch.no_grad(): # Get a batch of clean test images test_batch_clean, _ = next(iter(test_loader_clean)) test_batch_clean = test_batch_clean.to(device) # Create noisy versions of these images test_batch_noisy = add_gaussian_noise(test_batch_clean, noise_factor) # Get denoised outputs denoised_images = autoencoder(test_batch_noisy).cpu().numpy() # Move to CPU and convert to NumPy # Visualize original, noisy, and denoised images def display_reconstructions(original_flat, noisy_flat, reconstructed_flat, n=10): plt.figure(figsize=(20, 6)) for i in range(n): # Display original ax = plt.subplot(3, n, i + 1) plt.imshow(original_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray') ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) if i == 0: ax.set_title("Original", loc='left') # Display noisy input ax = plt.subplot(3, n, i + 1 + n) plt.imshow(noisy_flat[i].reshape(28, 28).cpu().numpy(), cmap='gray') ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) if i == 0: ax.set_title("Noisy Input", loc='left') # Display reconstruction ax = plt.subplot(3, n, i + 1 + n * 2) plt.imshow(reconstructed_flat[i].reshape(28, 28), cmap='gray') ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) if i == 0: ax.set_title("Denoised Output", loc='left') plt.show() display_reconstructions(test_batch_clean, test_batch_noisy, denoised_images)When you run display_reconstructions, you should see three rows of images:The original, clean test images.The noisy versions of these test images (which were fed into the DAE).The output of the DAE (the reconstructed, hopefully cleaner images).You should observe that the DAE has learned to remove a significant amount of noise, producing reconstructions that are much closer to the original images than the noisy inputs were. This demonstrates that the autoencoder has learned to capture the underlying structure of the digits.8. Using the Encoder for Feature ExtractionWhile denoising is a useful application in itself, our primary interest in this course is feature extraction. The encoder part of our DAE has learned to transform the input images (784 dimensions) into a compressed representation (64 dimensions in our example). These latent representations can serve as features for downstream tasks.To extract features, we simply use the encoder attribute of our trained autoencoder model:autoencoder.eval() # Set model to evaluation mode with torch.no_grad(): # Disable gradient calculations # It's often more useful to extract features from the clean data for downstream tasks # Ensure data is on the correct device clean_test_images_tensor = test_loader_clean.dataset.tensors[0].to(device) encoded_features_clean = autoencoder.encoder(clean_test_images_tensor).cpu().numpy() # You could also extract features from noisy data if that's your use case # noisy_test_images_tensor = add_gaussian_noise(clean_test_images_tensor, noise_factor) # encoded_features_noisy = autoencoder.encoder(noisy_test_images_tensor).cpu().numpy() print(f"Shape of extracted features from clean data: {encoded_features_clean.shape}") # Shape of extracted features from clean data: (10000, 64)The encoded_features_clean (or encoded_features_noisy if that's your need) now contain 64-dimensional feature vectors for each of the test images. These features are learned to be resilient to noise and capture essential information for reconstructing the digits. You could use these features as input to a classifier, for example. We will explore applications of extracted features more thoroughly in Chapter 7.Concluding RemarksIn this practice, you successfully implemented a Denoising Autoencoder using PyTorch. You saw how it can be trained to take noisy input and produce cleaner, reconstructed output. This process forces the model to learn more meaningful and resilient representations of the data. The encoder part of this DAE can then be used to extract these learned features, which are often more useful for other machine learning tasks than the raw input data, especially when dealing with noisy datasets.You've now added another powerful tool to your autoencoder toolkit. Next, we will look into Convolutional Autoencoders, which are particularly well-suited for image data.