While metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) tell us about the average magnitude of our prediction errors, they don't always give a complete picture of model performance in relation to the data's inherent variability. For instance, an RMSE of 10 might be excellent for predicting house prices ranging in the millions, but terrible for predicting temperatures ranging from 0 to 40.
This is where the Coefficient of Determination, commonly known as R-squared (R2), comes in. Instead of focusing solely on the error size, R-squared provides a measure of how much of the variability in the target variable (the value you are trying to predict) is explained by your model. It essentially compares your model's performance to a very simple baseline model.
Imagine the simplest possible "model" for predicting a continuous value: always predicting the average (mean) of all the actual target values in your dataset. This baseline model doesn't use any input features; it just makes the same guess every time. For example, if you're predicting house prices and the average price in your dataset is 300,000,thisbaselinemodelwouldpredict300,000 for every house, regardless of its size, location, or condition.
R-squared tells you how much better your actual regression model performs compared to this naive mean-prediction baseline.
To understand how R-squared is calculated, we need two components:
Total Sum of Squares (SST): This measures the total variance in the actual target values (yi). It's calculated by summing the squared differences between each actual value (yi) and the overall mean of the actual values (yˉ). It represents the variability inherent in the data if you were just using the mean as your prediction.
SST=i=1∑n(yi−yˉ)2Here, n is the number of data points, yi is the actual value for the i-th data point, and yˉ is the mean of all actual values.
Residual Sum of Squares (SSR) or Sum of Squared Errors (SSE): This measures the variance that your model cannot explain. It's calculated by summing the squared differences between each actual value (yi) and its corresponding predicted value (y^i) generated by your model. This is the same sum of squared errors used in calculating MSE, just without dividing by n.
SSR=i=1∑n(yi−y^i)2Here, y^i is the value predicted by your model for the i-th data point.
Now, the R-squared formula combines these two:
R2=1−SSTSSRThink about this formula:
R-squared values typically range from 0 to 1, although they can sometimes be negative (more on this in the "Limitations of R-squared" section). It's often expressed as a percentage.
Whether a specific R-squared value is "good" depends heavily on the context of the problem. In some fields, like physics experiments, you might expect very high R-squared values (>0.95). In others, like social sciences or predicting stock prices, explaining even a small portion of the variance (e.g., R2=0.1 or 0.2) might be considered significant because the underlying processes are very complex or noisy.
Consider two simple regression scenarios:
In this plot, the data points are tightly clustered around the regression line (red). The model's predictions are close to the actual values, resulting in a small SSR relative to SST, and therefore a high R-squared value (e.g., R² ≈ 0.99).
Here, the data points are much more scattered around the regression line. While the line shows a general trend, the model's predictions have larger errors on average. The SSR is a larger fraction of SST, leading to a low R-squared value (e.g., R² ≈ 0.3).
R-squared complements metrics like MAE, MSE, and RMSE. While the error metrics tell you the typical size of the prediction error in the original units of your target variable, R-squared gives you a dimensionless measure (a ratio or percentage) of how much of the data's variance your model has captured. It helps answer the question: "How well does my model fit the data compared to simply using the average?"
© 2025 ApX Machine Learning