Throughout this chapter, we've discussed the methods for selecting, tuning, and evaluating autoencoders for feature extraction. Now, let's bring these ideas together and apply them to a concrete classification problem. The objective here is not just to build an autoencoder, but to see how the features it learns can impact the performance of a downstream supervised learning model. We'll walk through the process of training a baseline classifier, then an autoencoder to extract features, and finally, a classifier using these new features, comparing the results along the way.
For this exercise, we'll use a common dataset that's suitable for classification and where feature extraction might offer some benefits. Let's consider the "Digits" dataset, available in scikit-learn, which consists of 8x8 pixel images of handwritten digits (0-9). Each image is represented as a 64-dimensional vector. Our goal is to classify these digits.
First, we need to establish a baseline. This involves training a standard classification model on the original, raw features of the dataset. This baseline will serve as a point of comparison to evaluate whether using autoencoder-extracted features provides any advantage.
Load and Prepare Data: We'll load the Digits dataset and split it into training and testing sets. It's also good practice to scale the features, typically to a [0, 1] range, which helps in training neural networks (including autoencoders) and many classifiers.
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load data
digits = load_digits()
X, y = digits.data, digits.target
# Scale features to [0, 1]
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y, test_size=0.3, random_state=42, stratify=y
)
Train Baseline Classifier: We'll use a simple Logistic Regression model as our baseline classifier.
# Train baseline Logistic Regression model
baseline_model = LogisticRegression(solver='liblinear', multi_class='ovr', random_state=42, max_iter=1000)
baseline_model.fit(X_train, y_train)
# Evaluate baseline model
y_pred_baseline = baseline_model.predict(X_test)
baseline_accuracy = accuracy_score(y_test, y_pred_baseline)
print(f"Baseline Logistic Regression Accuracy: {baseline_accuracy:.4f}")
Let's assume this gives us an accuracy of, say, 0.9556. This is the score we'll try to match or improve upon using autoencoder features, potentially with a more compact feature set.
Now, we'll design and train an autoencoder using PyTorch. The encoder part of this autoencoder will learn to transform the 64-dimensional input into a lower-dimensional representation.
Autoencoder Architecture: We'll construct a simple, fully-connected autoencoder. The dimensionality of the latent space is a key hyperparameter. Let's try reducing the 64 dimensions to, for example, 32.
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
# Convert numpy arrays to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
# Create TensorDatasets and DataLoaders
train_dataset = TensorDataset(X_train_tensor, X_train_tensor) # Autoencoder takes input as target
test_dataset = TensorDataset(X_test_tensor, X_test_tensor)
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
input_dim = X_train.shape[1] # Should be 64
latent_dim = 32 # Our chosen latent space dimensionality
# Define the Autoencoder model
class Autoencoder(nn.Module):
def __init__(self, input_dim, latent_dim):
super(Autoencoder, self).__init__()
# Encoder
self.encoder = nn.Sequential(
nn.Linear(input_dim, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, latent_dim), # Bottleneck layer
nn.ReLU() # ReLU here for non-negative features, common in some AE uses
)
# Decoder
self.decoder = nn.Sequential(
nn.Linear(latent_dim, 64),
nn.ReLU(),
nn.Linear(64, 128),
nn.ReLU(),
nn.Linear(128, input_dim),
nn.Sigmoid() # Sigmoid for [0,1] scaled data
)
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
autoencoder = Autoencoder(input_dim, latent_dim)
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
autoencoder.to(device)
# Define loss function and optimizer
criterion = nn.MSELoss() # MSE for reconstruction task
optimizer = optim.Adam(autoencoder.parameters(), lr=1e-3)
print(autoencoder)
Here, nn.Sigmoid()
is used in the final decoder layer because our input data X_scaled
is normalized to the [0, 1] range. nn.MSELoss()
(mean squared error) is a common loss function for reconstruction tasks with continuous-valued input. We also define the encoder
part as a separate sequential module within the Autoencoder
class for easy extraction later.
Train the Autoencoder: We train the autoencoder to reconstruct the input data.
epochs = 50
history = {'loss': [], 'val_loss': []}
for epoch in range(epochs):
# Training
autoencoder.train()
train_loss = 0
for batch_X, _ in train_loader: # _ is the target, which is identical to input for AE
batch_X = batch_X.to(device)
optimizer.zero_grad()
reconstruction = autoencoder(batch_X)
loss = criterion(reconstruction, batch_X)
loss.backward()
optimizer.step()
train_loss += loss.item() * batch_X.size(0) # Accumulate sum of losses
avg_train_loss = train_loss / len(train_loader.dataset)
history['loss'].append(avg_train_loss)
# Validation
autoencoder.eval()
val_loss = 0
with torch.no_grad():
for batch_X_test, _ in test_loader:
batch_X_test = batch_X_test.to(device)
reconstruction = autoencoder(batch_X_test)
loss = criterion(reconstruction, batch_X_test)
val_loss += loss.item() * batch_X_test.size(0)
avg_val_loss = val_loss / len(test_loader.dataset)
history['val_loss'].append(avg_val_loss)
print(f"Epoch {epoch+1}/{epochs}, Train Loss: {avg_train_loss:.4f}, Val Loss: {avg_val_loss:.4f}")
print("Autoencoder training complete.")
Monitoring the validation loss is important to prevent overfitting. If val_loss
starts increasing while loss
decreases, it's a sign of overfitting.
With the autoencoder trained, we can now use its encoder part to transform our original training and testing datasets into their latent representations.
Extract Features:
Use the encoder
module from our trained autoencoder
to get the compressed features.
autoencoder.eval() # Set autoencoder to evaluation mode
with torch.no_grad():
X_train_encoded = autoencoder.encoder(X_train_tensor.to(device)).cpu().numpy()
X_test_encoded = autoencoder.encoder(X_test_tensor.to(device)).cpu().numpy()
print(f"Original feature shape: {X_train.shape}")
print(f"Encoded feature shape: {X_train_encoded.shape}")
This should show that the number of features has been reduced from 64 to latent_dim
(32 in our example).
Train Classifier on Extracted Features:
Now, we train the same Logistic Regression classifier, but this time using X_train_encoded
and X_test_encoded
.
# Train Logistic Regression model on encoded features
ae_feature_model = LogisticRegression(solver='liblinear', multi_class='ovr', random_state=42, max_iter=1000)
ae_feature_model.fit(X_train_encoded, y_train)
# Evaluate model on encoded features
y_pred_ae_features = ae_feature_model.predict(X_test_encoded)
ae_features_accuracy = accuracy_score(y_test, y_pred_ae_features)
print(f"Logistic Regression with Autoencoder Features Accuracy: {ae_features_accuracy:.4f}")
Let's say our classifier using autoencoder features achieves an accuracy of 0.9611.
Here's a simple comparison:
Classifier accuracy using original features versus features extracted by an autoencoder.
In this hypothetical scenario, we achieved a slight improvement in accuracy while halving the number of features. This is a positive outcome. The autoencoder might have learned a more discriminative or noise-reduced representation of the data, beneficial for the classifier.
Potential Outcomes and Considerations:
latent_dim
might be too small, causing critical information for classification to be lost during compression.Further Steps You Can Take:
latent_dim
. A very small latent dimension might lead to information loss, while a very large one might not offer much compression or feature learning benefit. Plotting classifier performance against latent dimension size can be very insightful.latent_dim
is 2 or 3) to see if digits of the same class cluster together.This practice session demonstrates the end-to-end workflow of employing autoencoders for feature extraction in a supervised learning context. The key is to remember that the autoencoder is a tool; its effectiveness depends on careful design, training, and evaluation in the context of your specific problem and data. The features it extracts are not guaranteed to be better, but they offer a powerful way to transform your data, often leading to more compact and informative representations.
Was this section helpful?
© 2025 ApX Machine Learning