As we discussed, training a neural network involves minimizing the difference between its predictions and the actual target values. For regression problems, where the goal is to predict a continuous numerical value (like predicting house prices or stock values), we need specific loss functions tailored for this task. Let's examine two of the most common ones: Mean Squared Error (MSE) and Mean Absolute Error (MAE).
Mean Squared Error, often called MSE or L2 loss, is perhaps the most widely used loss function for regression. It measures the average of the squares of the errors between the predicted values and the true values.
Mathematically, for n data points, where yi is the true target value and y^i is the predicted value for the i-th data point, MSE is defined as:
LMSE=n1i=1∑n(yi−y^i)2Why square the difference?
The units of MSE are the square of the units of the target variable (e.g., dollars squared if predicting prices). While this might seem unintuitive, the Root Mean Squared Error (RMSE), simply MSE, is often reported for interpretability as it has the same units as the target variable. However, MSE is typically minimized during training due to its favorable mathematical properties.
Mean Absolute Error, also known as MAE or L1 loss, offers an alternative perspective on measuring regression error. Instead of squaring the differences, MAE calculates the average of the absolute differences between predictions and true values.
The formula for MAE is:
LMAE=n1i=1∑n∣yi−y^i∣Key characteristics of MAE:
The choice between MSE and MAE depends on the specific problem and data characteristics.
Here's a visual comparison of how MSE and MAE penalize errors:
Comparison of MSE (blue) and MAE (green) loss values based on the prediction error. Note the quadratic increase for MSE versus the linear increase for MAE.
In practice, MSE is often the default choice due to its mathematical properties aligning well with gradient descent. However, if outliers are a significant concern, experimenting with MAE (or other robust loss functions like Huber loss, which combines aspects of both) is worthwhile.
Here's a quick example using PyTorch to calculate both losses:
import torch
import torch.nn as nn
# Sample predictions and targets
predictions = torch.tensor([2.5, 0.0, 1.8, 9.1])
targets = torch.tensor([3.0, -0.5, 2.0, 8.0])
# Calculate MSE Loss
mse_loss_fn = nn.MSELoss()
mse_loss = mse_loss_fn(predictions, targets)
# Calculate MAE Loss (L1Loss in PyTorch)
mae_loss_fn = nn.L1Loss()
mae_loss = mae_loss_fn(predictions, targets)
print(f"Predictions: {predictions.numpy()}")
print(f"Targets: {targets.numpy()}")
print(f"Errors: {(targets - predictions).numpy()}")
print(f"Squared Errors: {((targets - predictions)**2).numpy()}")
print(f"Absolute Errors: {torch.abs(targets - predictions).numpy()}")
print("-" * 30)
print(f"MSE Loss: {mse_loss.item():.4f}") # Average of [0.25, 0.25, 0.04, 1.21] = 1.75 / 4 = 0.4375
print(f"MAE Loss: {mae_loss.item():.4f}") # Average of [0.5, 0.5, 0.2, 1.1] = 2.3 / 4 = 0.5750
This simple calculation demonstrates how the squaring in MSE gives more weight to the larger error (1.1) compared to MAE. Understanding these differences is important when selecting the appropriate loss function to guide your model's training process towards making accurate predictions.
© 2025 ApX Machine Learning