Before a Keras model can be trained, it needs to be configured or "compiled". The compile()
method sets up the model for training. One of the most significant arguments you'll provide to compile()
is the loss
function.
The loss function, also known as the objective function or cost function, quantifies how well the model is performing. During training, the goal is to minimize this function's value. It calculates a measure of difference between the model's predictions (ypred) and the actual target values (ytrue). A smaller loss value indicates that the model's predictions are closer to the true values.
The choice of loss function is directly tied to the type of machine learning problem you are solving. Let's look at the standard choices for common tasks.
Regression problems involve predicting continuous values, like house prices or temperature.
Mean Squared Error (MSE): This is perhaps the most common loss function for regression. It calculates the average of the squared differences between predictions and true values. Squaring the difference penalizes larger errors more heavily and ensures the result is always positive.
MSE=N1i=1∑N(ytrue,i−ypred,i)2Where N is the number of samples. You typically specify it in Keras using the string identifier 'mean_squared_error'
or 'mse'
.
Mean Absolute Error (MAE): This loss function calculates the average of the absolute differences between predictions and true values. Unlike MSE, MAE doesn't square the errors, making it less sensitive to outliers. If your dataset contains significant outliers that you don't want to dominate the loss, MAE might be a better choice.
MAE=N1i=1∑N∣ytrue,i−ypred,i∣You can specify it using 'mean_absolute_error'
or 'mae'
.
Classification problems involve predicting a discrete class label, like identifying spam emails or classifying images.
Binary Crossentropy: Use this loss function for binary (two-class) classification problems. It measures the distance between the true probability distribution (e.g., [0, 1]
or [1, 0]
) and the predicted probability distribution. It expects the model's final layer to have a single output unit with a sigmoid activation function (outputting a probability between 0 and 1), and the target values should be 0 or 1. The formula for binary crossentropy for a single prediction is:
The final loss is averaged over all samples. Specify it using the string 'binary_crossentropy'
.
Categorical Crossentropy: This is the standard loss function for multi-class classification when your target labels are one-hot encoded. For example, if you have three classes, the targets might look like [1, 0, 0]
, [0, 1, 0]
, or [0, 0, 1]
. It expects the model's final layer to have C output units (where C is the number of classes) and use a softmax
activation function, which outputs a probability distribution across the classes. The formula for a single sample is:
Where C is the number of classes. The final loss is averaged over all samples. Specify it using the string 'categorical_crossentropy'
.
Sparse Categorical Crossentropy: This loss function serves the same purpose as categorical crossentropy but is used when your target labels are provided as integers (e.g., 0, 1, 2 for three classes) rather than one-hot encoded vectors. This is often more convenient as it avoids the need to explicitly convert integer labels to one-hot vectors. The model output requirements (C units, softmax activation) remain the same as for categorical crossentropy. Specify it using 'sparse_categorical_crossentropy'
. This often saves memory and computation compared to using 'categorical_crossentropy'
with explicitly one-hot encoded labels, especially for a large number of classes.
You provide the chosen loss function to the loss
argument within the model.compile()
method. You can typically do this in two ways:
Using String Identifiers: Pass the string name of the desired loss function. Keras recognizes common ones like 'mean_squared_error'
, 'binary_crossentropy'
, 'categorical_crossentropy'
, and 'sparse_categorical_crossentropy'
. This is the most frequent approach for standard losses.
# For regression
model.compile(optimizer='adam', loss='mean_squared_error')
# For binary classification
model.compile(optimizer='rmsprop', loss='binary_crossentropy')
# For multi-class classification (one-hot labels)
model.compile(optimizer='adam', loss='categorical_crossentropy')
# For multi-class classification (integer labels)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
Using Loss Function Objects: Instantiate a loss function object from the tf.keras.losses
module. This allows for potential customization if the loss function accepts arguments, although standard usage often doesn't require this.
import tensorflow as tf
mse_loss = tf.keras.losses.MeanSquaredError()
bce_loss = tf.keras.losses.BinaryCrossentropy()
cce_loss = tf.keras.losses.CategoricalCrossentropy()
scce_loss = tf.keras.losses.SparseCategoricalCrossentropy()
# Example usage:
model.compile(optimizer='adam', loss=mse_loss)
# or directly pass the class
model.compile(optimizer='adam', loss=tf.keras.losses.MeanSquaredError())
While TensorFlow provides a wide range of built-in loss functions, you can also define your own custom loss functions if your problem requires a specific objective not covered by the standard options. This typically involves creating a Python function that takes y_true
and y_pred
as arguments and returns the calculated loss value as a tensor.
Choosing the correct loss function is fundamental for successful model training. It directly defines the objective the model attempts to achieve during the optimization process. After defining the loss, the next step in compiling the model is selecting an optimizer, which dictates how the model's weights are updated to minimize this loss.
© 2025 ApX Machine Learning