Calculating MAE, MSE, RMSE, and R-squared with actual numbers helps solidify understanding and prepares you to evaluate your own regression models.Example Scenario: Predicting House PricesImagine we've built a simple regression model to predict house prices based on their size (in square feet). We trained this model on some data, and now we want to evaluate its performance on a separate set of 5 houses it hasn't seen before (our test set).Here are the actual prices and the prices predicted by our model (in thousands of dollars):House IDSize (sq ft)Actual Price ($1000s) $(y_i)$Predicted Price ($1000s) $(\hat{y}_i)$1150030031021200250240318003803504140029030051600330340Our goal is to use the metrics we've learned to quantify how well our model's predictions match the actual prices. We have $n=5$ data points in our test set.Step 1: Calculate Individual ErrorsThe first step is always to find the difference between the actual value ($y_i$) and the predicted value ($\hat{y}_i$) for each data point. This difference is called the error or residual.$$ Error_i = y_i - \hat{y}_i $$Let's add an 'Error' column to our table:House IDActual Price $(y_i)$Predicted Price $(\hat{y}_i)$Error $(y_i - \hat{y}_i)$1300310-102250240103380350304290300-105330340-10Step 2: Calculate Mean Absolute Error (MAE)MAE gives us the average size of the errors, ignoring whether they are positive or negative.The formula is: $$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$Calculate Absolute Errors: Take the absolute value of each error.Sum Absolute Errors: Add them up.Calculate Average: Divide the sum by the number of data points ($n=5$).| House ID | Error $(y_i - \hat{y}_i)$ | Absolute Error $|y_i - \hat{y}_i|$ | | :------- | :---------------------- | :-------------------------------- | | 1 | -10 | 10 | | 2 | 10 | 10 | | 3 | 30 | 30 | | 4 | -10 | 10 | | 5 | -10 | 10 | | Sum | | 70 |Now, calculate the average: $$ MAE = \frac{70}{5} = 14 $$Interpretation: On average, our model's price prediction is off by $14,000. The units of MAE are the same as the target variable (thousands of dollars in this case).Step 3: Calculate Mean Squared Error (MSE)MSE calculates the average of the squared errors. Squaring the errors makes them all positive and gives much more weight to larger errors.The formula is: $$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$Calculate Squared Errors: Square each individual error.Sum Squared Errors: Add them up.Calculate Average: Divide the sum by the number of data points ($n=5$).House IDError $(y_i - \hat{y}_i)$Squared Error $(y_i - \hat{y}_i)^2$1-101002101003309004-101005-10100Sum1300Now, calculate the average: $$ MSE = \frac{1300}{5} = 260 $$Interpretation: The MSE is 260. Notice how the single large error of 30 for House 3 contributed 900 to the sum, significantly impacting the MSE. The units here are (thousands of dollars) squared, which isn't very intuitive. This leads us to RMSE.Step 4: Calculate Root Mean Squared Error (RMSE)RMSE is simply the square root of the MSE. This brings the metric back into the original units of the target variable, making it easier to interpret.The formula is: $$ RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} $$Calculate MSE: We already did this, MSE = 260.Take the Square Root: Calculate $\sqrt{260}$.$$ RMSE = \sqrt{260} \approx 16.12 $$Interpretation: The RMSE is approximately $16,120. Like MAE, it represents a typical magnitude of error in our predictions, measured in thousands of dollars. Because it's derived from MSE, RMSE is more sensitive to large errors than MAE (notice RMSE 16.12 > MAE 14).Step 5: Calculate Coefficient of Determination (R-squared, $R^2$)$R^2$ tells us the proportion of the variance in the actual prices that our model is able to explain. It compares our model's errors to the errors of a very simple baseline model that just predicts the average price for all houses.The formula is: $$ R^2 = 1 - \frac{SSR}{SST} = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}i)^2}{\sum{i=1}^{n} (y_i - \bar{y})^2} $$Where:$SSR$ (Sum of Squared Residuals) is the sum of our model's squared errors. We calculated this for MSE: $SSR = 1300$.$SST$ (Total Sum of Squares) is the sum of squared differences between the actual prices and the average actual price ($\bar{y}$). This represents the total variance in the actual prices.Let's calculate $SST$:Calculate the Mean Actual Price ($\bar{y}$): $$ \bar{y} = \frac{300 + 250 + 380 + 290 + 330}{5} = \frac{1550}{5} = 310 $$ The average actual price is $310,000.Calculate Deviations from the Mean: Find $(y_i - \bar{y})$ for each house.Calculate Squared Deviations: Square these deviations.Sum Squared Deviations: Add them up to get $SST$.House IDActual Price $(y_i)$Mean Price $(\bar{y})$Deviation $(y_i - \bar{y})$Squared Deviation $(y_i - \bar{y})^2$1300310-101002250310-60360033803107049004290310-20400533031020400Sum9400 ($SST$)So, $SST = 9400$.Now we can calculate $R^2$: $$ R^2 = 1 - \frac{SSR}{SST} = 1 - \frac{1300}{9400} $$ $$ R^2 = 1 - 0.1383 \approx 0.8617 $$Interpretation: Our model's $R^2$ is approximately 0.86, or 86%. This means that our model (using house size) explains about 86% of the variability observed in the actual house prices in our test set. This is generally considered a good fit for this simple example. An $R^2$ of 1 would mean a perfect fit, while an $R^2$ of 0 would mean our model is no better than just predicting the average price for every house.Summary of ResultsHere's a summary of the calculated metrics for our house price prediction model:MetricValueInterpretationMAE14.00Average absolute prediction error is $14,000.MSE260.00Average squared prediction error (units are $ squared).RMSE16.12Typical prediction error magnitude is $16,120 (sensitive to large errors).$R^2$0.8617Model explains ~86% of the variance in house prices.Visualizing Predictions vs ActualsA scatter plot is often helpful for visualizing regression performance. We plot actual values on one axis and predicted values on the other. Points falling close to the diagonal line ($y=x$) indicate accurate predictions.{"layout": {"xaxis": {"title": "Actual Price ($1000s)"}, "yaxis": {"title": "Predicted Price ($1000s)"}, "title": "Actual vs. Predicted House Prices", "shapes": [{"type": "line", "x0": 240, "y0": 240, "x1": 380, "y1": 380, "line": {"color": "#adb5bd", "dash": "dash"}}], "width": 600, "height": 450}, "data": [{"x": [300, 250, 380, 290, 330], "y": [310, 240, 350, 300, 340], "mode": "markers", "type": "scatter", "name": "Predictions", "marker": {"color": "#228be6", "size": 10}}]}Scatter plot showing actual house prices against predicted prices. The dashed gray line represents a perfect prediction ($Predicted = Actual$). Points close to this line indicate good predictions by the model.This practice exercise demonstrates how to compute the standard regression metrics from a set of actual and predicted values. Calculating these yourself builds intuition about what each metric represents and how they respond differently to prediction errors. When evaluating your own models, you'll typically use libraries that compute these for you, but understanding the underlying calculations is fundamental.