Once your autoencoder's architecture is defined, meaning you've stacked up the layers for the encoder and decoder as discussed in the previous section, "Constructing a Simple Autoencoder Model", the next step is to prepare it for the learning process. In Keras, this preparation is done through a crucial step called "compiling" the model. Think of compiling as giving your model its instructions: what it should try to achieve (the loss function), how it should try to achieve it (the optimizer), and how you want to measure its progress (metrics).
You'll use the model.compile()
method, providing it with these three important pieces of information. The diagram below illustrates how these components fit into the model configuration stage.
This diagram shows that after defining the model's layers, you compile it by specifying an optimizer, a loss function, and optional metrics. This prepares the model for training.
Let's look at each of these components.
The loss function (also known as a cost function or objective function) is fundamental to training any neural network. It quantifies how "far off" the model's output is from the actual target. During training, the model tries to minimize this loss value.
For an autoencoder, the goal is to reconstruct the input as accurately as possible. So, the output of the autoencoder, x^, should be very similar to the original input, x. The loss function, therefore, measures the difference, or "reconstruction error," between x and x^.
Mean Squared Error (MSE)
A very common and effective loss function for autoencoders, especially when dealing with image data or other continuous values, is the Mean Squared Error (MSE). As mentioned in this chapter's introduction, MSE is calculated as:
L(x,x^)=N1∑i=1N(xi−x^i)2
Let's break this down:
So, MSE gives you the average squared difference between the original input and the reconstructed output. A lower MSE means the reconstruction is closer to the original. When you train your autoencoder with MSE, you're essentially telling it: "Try to make the squared difference between each input pixel and its corresponding output pixel as small as possible, on average."
In Keras, you can specify MSE using the string identifier 'mean_squared_error'
or simply 'mse'
.
Binary Cross-Entropy (BCE)
Another popular choice, especially if your input data is normalized to a range of [0, 1] (like MNIST digits often are) and your decoder's final layer uses a sigmoid
activation function, is Binary Cross-Entropy (BCE). The sigmoid
function squashes its output into the [0, 1] range, which can be interpreted as a probability.
BCE is typically used when you're comparing probability distributions. In the context of an autoencoder with sigmoid outputs, you can think of each pixel value in the input x (which must be 0 or 1, or scaled between 0 and 1) as a target probability, and each corresponding pixel in the reconstruction x^ as the model's predicted probability.
The formula for BCE for N individual values (e.g., pixels) is:
L(x,x^)=−N1∑i=1N(xilog(x^i)+(1−xi)log(1−x^i))
This formula might look a bit complex, but its effect is to heavily penalize the model if it's very confident (e.g., x^i is close to 0 or 1) but wrong (e.g., xi was 1 but x^i was close to 0).
In Keras, BCE is specified using 'binary_crossentropy'
.
Which to choose? For your first autoencoder dealing with image data like MNIST (where pixel values are often scaled between 0 and 1), both MSE and BCE (if using a sigmoid output layer) can work well. MSE is often a more direct measure of pixel intensity differences and is a great starting point.
Once you have a way to measure error (the loss function), you need a mechanism to adjust the autoencoder's internal parameters (its weights and biases) to reduce this error. This is the job of the optimizer.
Think of the learning process as trying to find the bottom of a valley, where the bottom represents the lowest possible loss. The optimizer is like your guide, deciding which direction to take and how large a step to make at each point to reach that bottom. It uses the calculated loss to make these decisions, typically employing techniques related to calculus (like gradients, which indicate the slope or direction of steepest ascent/descent).
Adam Optimizer
The Adam (Adaptive Moment Estimation) optimizer is a very popular choice in deep learning and often a good default. It adapts the learning rate for each parameter individually and generally converges quickly. It combines ideas from other optimizers like RMSprop and AdaGrad. For beginners, Adam often works well out-of-the-box without needing much tuning.
In Keras, you can use it with the string 'adam'
.
Stochastic Gradient Descent (SGD)
SGD is a more foundational optimizer. It's simpler in principle: it calculates the gradient of the loss function with respect to the model parameters for a small batch of data and updates the parameters in the opposite direction of the gradient. While effective, SGD can sometimes be slower to converge than Adam and might require more careful tuning of its parameters, such as the learning rate and momentum.
In Keras, it's 'sgd'
.
Learning Rate
A significant parameter for most optimizers is the learning rate. It determines the size of the steps the optimizer takes when updating the model's weights.
Optimizers like Adam have adaptive learning rates, which helps manage this. When you specify an optimizer in Keras by its string name (e.g., 'adam'
), it uses default, generally well-performing, parameter values, including the learning rate. For more advanced control, you can instantiate the optimizer object and set these parameters yourself (e.g., tf.keras.optimizers.Adam(learning_rate=0.001)
), but for now, the defaults are fine.
While the loss function is what the optimizer actively tries to minimize, metrics are used to monitor the model's performance during training and evaluation. They give you (the human) a way to understand how well your model is doing.
Often, the loss function itself is a perfectly good metric to track. If you're using MSE as your loss, you'll likely want to see the MSE value at each stage of training.
You can also specify other metrics. For example, Mean Absolute Error (MAE) is another way to measure reconstruction error: MAE=N1∑i=1N∣xi−x^i∣ MAE calculates the average absolute difference between inputs and outputs. Unlike MSE, it doesn't square the differences, so it's less sensitive to very large, outlier errors.
For autoencoders, classification-style metrics like 'accuracy'
are generally not very informative because the task is reconstruction, not classifying inputs into categories. Stick to metrics that measure the difference between the input and the reconstruction, like 'mse'
or 'mae'
.
In Keras, you provide metrics as a list of strings, for example, metrics=['mse', 'mae']
.
compile()
CallNow that you understand the loss function, optimizer, and metrics, you can put them together in the model.compile()
call. This call configures your model for training.
Here’s how you would typically compile a simple autoencoder model in Keras:
# Assume 'autoencoder_model' is your constructed Keras Sequential model
# For example, if your input data (and thus output data) is normalized between 0 and 1
# Option 1: Using Mean Squared Error (a common choice)
autoencoder_model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['mse', 'mae'])
# Option 2: Using Binary Cross-Entropy
# (good if output layer uses sigmoid and inputs are 0-1)
# autoencoder_model.compile(optimizer='adam',
# loss='binary_crossentropy',
# metrics=['mse', 'mae']) # You can still monitor MSE/MAE
In this example:
optimizer='adam'
: We're telling Keras to use the Adam optimizer with its default settings.loss='mean_squared_error'
: We're specifying that the model should try to minimize the Mean Squared Error between the input and the reconstructed output. (Or 'binary_crossentropy'
in the alternative).metrics=['mse', 'mae']
: We're asking Keras to keep track of both MSE and MAE during training and evaluation so we can see these values.Once this compile()
step is done, your autoencoder model is fully configured and ready for the next stage: feeding it data and starting the training process, which you'll learn about in the section "Executing the Training Process."
Was this section helpful?
© 2025 ApX Machine Learning