Alright, you've learned about what autoencoders are and how they learn. Now, let's roll up our sleeves and get to the practical part: building one. Before we can construct our autoencoder, we need data. This is the fuel for any machine learning model, and autoencoders are no exception. In this section, we'll focus on loading a suitable dataset and getting a good feel for what it contains.
For our first autoencoder, we want a dataset that's straightforward to work with. This allows us to concentrate on the autoencoder's mechanics without getting bogged down in complex data-handling issues. The MNIST dataset of handwritten digits is a classic choice for this purpose, often considered the "hello world" of image processing in machine learning.
The MNIST dataset consists of tens of thousands of small, grayscale images of handwritten digits (0 through 9). Each image is 28 pixels wide and 28 pixels tall. It's widely used because it's simple enough for quick experiments but rich enough to demonstrate many machine learning principles.
Here’s what you need to know about MNIST for our purposes:
TensorFlow, through its Keras API, provides a very convenient way to download and load the MNIST dataset directly into our Python environment. You don't need to manually download files or deal with complicated parsing.
Let's see how to load it. First, you'll need to have TensorFlow installed. If you've followed the setup instructions in the "Python Environment Setup for Deep Learning" section, you should be ready.
You can load the dataset with just a couple of lines of Python code:
from tensorflow.keras.datasets import mnist
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Let's break down what happens here:
mnist
module from tensorflow.keras.datasets
.mnist.load_data()
downloads the dataset (if it's not already on your machine) and splits it into training and testing sets.
x_train
: These are the images that we will use to teach our autoencoder. It's an array of 60,000 images.y_train
: These are the corresponding labels for the x_train
images (the actual digits 0-9). We won't use these directly for training our basic autoencoder's reconstruction task.x_test
: These are 10,000 images set aside for evaluating how well our autoencoder performs on data it hasn't seen during training.y_test
: These are the labels for the x_test
images.Our autoencoder will learn by trying to reconstruct the images in x_train
.
Once the data is loaded, the next step is to understand its structure. What does this data actually look like in terms of numbers and dimensions?
Let's find out the dimensions of our image data. You can do this by checking the shape
attribute of the NumPy arrays:
print(f"x_train shape: {x_train.shape}")
print(f"Number of training samples: {x_train.shape[0]}")
print(f"Image dimensions: {x_train.shape[1]}x{x_train.shape[2]} pixels")
print(f"x_test shape: {x_test.shape}")
print(f"Number of testing samples: {x_test.shape[0]}")
If you run this, you'll likely see output similar to this:
x_train shape: (60000, 28, 28)
Number of training samples: 60000
Image dimensions: 28x28 pixels
x_test shape: (10000, 28, 28)
Number of testing samples: 10000
This tells us:
x_train
contains 60,000 samples (images).x_train
is 28 pixels high and 28 pixels wide.x_test
contains 10,000 images, each also 28x28 pixels.This 28x28 dimension means each image is made up of 28×28=784 pixels. Our autoencoder will eventually take these 784 pixel values as input.
What kind of numbers are we dealing with? Let's check the data type of the pixel values:
print(f"Data type of x_train: {x_train.dtype}")
The output will typically be:
Data type of x_train: uint8
uint8
means "unsigned 8-bit integer." This is a common data type for images where pixel values range from 0 to 255.
You can confirm the range of pixel values by checking the minimum and maximum values in the dataset:
import numpy as np
print(f"Minimum pixel value: {np.min(x_train)}")
print(f"Maximum pixel value: {np.max(x_train)}")
This should output:
Minimum pixel value: 0
Maximum pixel value: 255
This range (0-255) is important. We'll often need to normalize these values (e.g., scale them to a 0-1 range) before feeding them into a neural network, which we'll cover in the "Data Preprocessing for Autoencoders" section.
Numbers and shapes are informative, but there's no substitute for actually looking at your data, especially when it's image data. Let's display a few of these handwritten digits. We can use a popular Python library called Matplotlib for this.
import matplotlib.pyplot as plt
# Display the first 10 images from x_train
plt.figure(figsize=(10, 2)) # Adjust figure size for better layout
for i in range(10):
plt.subplot(1, 10, i + 1) # Create a subplot for each image
plt.imshow(x_train[i], cmap='gray') # Display the image in grayscale
plt.title(f"Label: {y_train[i]}") # Show the label as title
plt.axis('off') # Hide the axes ticks and labels
plt.show()
When you run this code, you'll see a window pop up (or an image embedded in your Jupyter Notebook) showing something like this:
A row of ten small images, each displaying a handwritten digit (e.g., 5, 0, 4, 1, 9, 2, 1, 3, 1, 4). Each image has its corresponding label (the digit it represents) shown above it. The images are in grayscale.
This visualization confirms that our data looks as expected: images of digits. You can see the variation in handwriting, which is what makes this a non-trivial task for a machine to learn. Notice the cmap='gray'
argument in plt.imshow()
. This tells Matplotlib to render the image in grayscale, as the pixel values represent intensities rather than colors. Without it, Matplotlib might use a default colormap that could be misleading.
Using MNIST for your first autoencoder project offers several advantages:
Our autoencoder will learn to take one of these 28x28 pixel images, compress it down to a much smaller representation in its bottleneck layer, and then try to reconstruct the original 28x28 image from this compressed form. By looking at the data, you now have a clearer picture of what the input to our autoencoder will be.
With the MNIST dataset loaded and understood, we're ready to prepare it for our autoencoder. The next step, "Data Preprocessing for Autoencoders," will cover the necessary transformations to get this raw data into the perfect shape and format for our neural network.
Was this section helpful?
© 2025 ApX Machine Learning