Okay, let's start by thinking about what happens when a regression model makes a prediction. Unlike classification models that predict categories (like 'spam' or 'not spam'), regression models predict continuous numbers. Imagine you've built a model to predict tomorrow's temperature, or the price of a used car based on its features. The model gives you a specific number as its prediction.
The fundamental question we need to answer is: How good are these numerical predictions? To do that, we first need to quantify the error for each individual prediction.
For every single data point you use to test your model, you have two values:
The prediction error, often called the residual, is simply the difference between the actual value and the predicted value for a single observation.
Mathematically, we calculate the error ei for the i-th data point as:
ei=yi−y^i
This formula tells us how far off the prediction was from the actual value.
Let's say we're testing a model that predicts the maximum daily temperature (in degrees Celsius). We have the actual temperatures for four different days, and we also have the predictions our model made for those days:
Now, let's calculate the prediction error for each day using ei=yi−y^i:
Notice the sign of the error:
Calculating these individual errors is the essential first step. Each error tells us the magnitude and direction of the inaccuracy for one specific prediction.
Scatter plot showing actual vs. predicted temperatures. The dashed line represents perfect predictions (y=y^). Arrows indicate the prediction error (vertical distance) for three data points.
While looking at individual errors is informative, especially for understanding specific failures of the model, it doesn't give us a single, overall measure of the model's performance across all predictions. If we have thousands or millions of predictions, we need a way to summarize these individual errors into one or more metrics.
These individual errors, e1,e2,e3,..., are the foundational components we'll use to calculate the aggregate regression metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE), which we will cover next.
© 2025 ApX Machine Learning