Alright, we've discussed the idea behind Denoising Autoencoders (DAEs): they're trained to reconstruct an original, clean input even when they're fed a corrupted version of it. This simple yet effective modification compels the autoencoder to learn more useful and robust features, as it can't just learn a trivial identity mapping. Now, let's get into the practicalities of building one.
Implementing a Denoising Autoencoder involves a few distinct steps, primarily centered around how you prepare your data and what you ask the model to predict.
The defining characteristic of a Denoising Autoencoder is its training on noisy inputs. So, the first step is to artificially corrupt your clean dataset. Let the original clean input be x. We create a corrupted version, x~, by adding some form of noise or by masking some parts of x. The autoencoder will then learn to map x~ back to x.
There are several common ways to introduce this corruption:
Additive Gaussian Noise: This involves adding random numbers sampled from a Gaussian (normal) distribution to your input data.
If your data is normalized (e.g., pixel values between 0 and 1), you'd typically add noise with a mean of 0 and a chosen standard deviation (noise factor). After adding noise, you might need to clip the values to ensure they remain within the valid range (e.g., [0, 1]
for normalized images).
A typical operation in Python using PyTorch might look like this:
import torch
noise_factor = 0.5 # Adjust this hyperparameter
# Assuming x_train_clean is a PyTorch tensor
x_train_noisy = x_train_clean + noise_factor * torch.randn_like(x_train_clean)
x_train_noisy = torch.clamp(x_train_noisy, 0., 1.) # Ensure data stays in valid range
The noise_factor
controls the amount of noise. A larger factor means more corruption.
Masking Noise (Salt and Pepper Noise): This involves randomly setting a fraction of the input features (e.g., pixels in an image, values in a data vector) to a minimum or maximum value, or simply zeroing them out. For instance, you could randomly set a certain percentage of pixels in an image to 0.
Using PyTorch, this could be implemented as:
# For setting random elements to 0
noise_factor = 0.5 # Fraction of elements to zero out (e.g., 0.5 means 50% are dropped)
# Create a binary mask: 1 with probability (1 - noise_factor), 0 otherwise
mask = (torch.rand_like(x_train_clean) > noise_factor).float()
x_train_noisy = x_train_clean * mask
Here, noise_factor
represents the proportion of input features that are "dropped" or masked.
The choice of noise type and its intensity (noise factor) are important hyperparameters. You should select a noise type that reflects the kind of perturbations you expect in real-world data or the invariances you want your model to learn. The amount of noise should be significant enough to prevent the autoencoder from learning a simple identity function, but not so overwhelming that it becomes impossible for the model to recover the original signal.
The architecture of a Denoising Autoencoder (the encoder and decoder networks) can be very similar to that of a standard autoencoder. You can use fully connected layers for tabular data, or convolutional and transposed convolutional layers for image data, just as you would for their non-denoising counterparts.
The critical difference isn't in the layers themselves but in what the network sees during training and what it's trying to produce.
This is where Denoising Autoencoders significantly diverge from standard autoencoders. During training:
The loss function then measures the difference between the decoder's output x′ and the original clean data x. Common loss functions include:
nn.MSELoss()
.
L(x,x′)=N1∑i=1N(xi−xi′)2nn.BCELoss()
(for raw probabilities) or nn.BCEWithLogitsLoss()
(for logits).
L(x,x′)=−N1∑i=1N[xilog(xi′)+(1−xi)log(1−xi′)]Let's visualize the data flow and training objective:
This diagram illustrates the Denoising Autoencoder training pipeline. Clean data x is first corrupted to produce x~. The autoencoder takes x~ as input, and its decoder attempts to reconstruct the original clean data x. The loss function is computed by comparing this reconstruction x′ against the original x.
When using PyTorch, the training loop setup would look something like this:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
# Assume your Autoencoder class is defined as before (with encoder and decoder parts)
# Example:
# class Autoencoder(nn.Module):
# def __init__(self, input_dim, latent_dim):
# super().__init__()
# self.encoder = nn.Sequential(...)
# self.decoder = nn.Sequential(...)
# def forward(self, x):
# return self.decoder(self.encoder(x))
# autoencoder_model = Autoencoder(input_dim, latent_dim).to(device)
# criterion = nn.MSELoss()
# optimizer = optim.Adam(autoencoder_model.parameters(), lr=0.001)
# Assume x_train_clean_tensor and x_test_clean_tensor are your original clean data tensors
# Assume your data loaders train_loader and test_loader are created using TensorDataset
epochs = 50
for epoch in range(epochs):
autoencoder_model.train()
for clean_data_batch, _ in train_loader: # _ is just a placeholder, as clean_data_batch is input and target
clean_data_batch = clean_data_batch.to(device)
# Apply noise to create corrupted input
noise_factor = 0.2
noisy_data_batch = clean_data_batch + noise_factor * torch.randn_like(clean_data_batch)
noisy_data_batch = torch.clamp(noisy_data_batch, 0., 1.) # Clip to valid range
optimizer.zero_grad()
reconstructed_output = autoencoder_model(noisy_data_batch) # Input: noisy data
loss = criterion(reconstructed_output, clean_data_batch) # Target: clean data
loss.backward()
optimizer.step()
# (Optional) Add validation loop similar to previous examples
# autoencoder_model.eval()
# with torch.no_grad():
# for clean_data_batch_val, _ in test_loader:
# clean_data_batch_val = clean_data_batch_val.to(device)
# noisy_data_batch_val = clean_data_batch_val + noise_factor * torch.randn_like(clean_data_batch_val)
# noisy_data_batch_val = torch.clamp(noisy_data_batch_val, 0., 1.)
# val_reconstruction = autoencoder_model(noisy_data_batch_val)
# val_loss = criterion(val_reconstruction, clean_data_batch_val)
# ... print validation loss
print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}")
Notice carefully that noisy_data_batch
is passed as the input, but clean_data_batch
is passed as the target. This forces the network to learn to remove the noise.
By training the autoencoder to denoise, you're essentially forcing it to learn the underlying manifold or structure of your data. It cannot simply learn an identity function because the input and target are different. To successfully reconstruct the clean data from a noisy version, the encoder must capture the essential characteristics and discard the noise. The resulting latent representation z thus tends to be more robust and contains more salient information about the data.
Once the Denoising Autoencoder is trained, you typically use its encoder part. You would feed new, clean data through this trained encoder to obtain robust feature representations. These features can then be used for downstream tasks like classification or clustering.
For instance, to get an encoder from a trained PyTorch DAE:
# Assuming 'autoencoder_model' is your trained DAE instance
# and 'x_new_clean_data' is your new, clean data (as a PyTorch tensor)
autoencoder_model.eval() # Set model to evaluation mode
with torch.no_grad(): # Disable gradient calculations
clean_features = autoencoder_model.encoder(x_new_clean_data.to(device))
# clean_features will be a PyTorch tensor on the device; convert to NumPy if needed:
# clean_features_np = clean_features.cpu().numpy()
Implementing a Denoising Autoencoder is a straightforward extension of a standard autoencoder, but this change in the training objective, reconstructing clean from noisy, has a significant impact on the quality and robustness of the learned features. The hands-on exercise later in this chapter will guide you through building and training one from scratch.
Was this section helpful?
© 2025 ApX Machine Learning