A simplified, concrete example illustrates a basic evaluation workflow. A model aims to predict house prices based on their size in square feet. This is a regression problem because the price is a continuous numerical value.
Scenario: Predicting House Prices
We have collected data on 10 houses, noting their size and sale price:
Size (sq ft)
Price ($1000s)
1500
300
1600
320
1700
350
1800
380
1900
400
2000
410
2100
430
2200
450
1400
280
2300
460
Our goal is to train a model on some of this data and then evaluate how well it predicts prices on data it hasn't seen before.
The Evaluation Workflow Steps
Here's how we apply the standard workflow:
A visual representation of the evaluation workflow for our house price prediction example.
Choose Metrics: Since this is a regression problem, we'll use metrics suited for continuous values. Let's choose Mean Absolute Error (MAE), Mean Squared Error (MSE), and the Coefficient of Determination (R-squared or R2). These will tell us the average error, penalize larger errors, and indicate the proportion of price variance explained by our model, respectively.
Split Data: We need to separate our data into a training set and a test set. A common split is 80% for training and 20% for testing. We'll randomly select 8 houses for training and reserve the remaining 2 for testing. It's important that the model never sees the test data during training.
Training Set (8 houses): Let's say these are randomly selected: (1500, 300), (1700, 350), (1800, 380), (1900, 400), (2000, 410), (2100, 430), (1400, 280), (2300, 460).
Test Set (2 houses): The remaining houses: (1600, 320), (2200, 450).
Train Model: We use the training set (the 8 houses) to train our machine learning model. Let's imagine we use a simple linear regression model. The training process finds the best line that fits the training data points (size vs. price). For this example, let's assume the trained model learns the relationship:
Predicted Price=0.2×Size+50
(Note: This is a simplified model equation for illustration purposes).
Generate Predictions: Now, we use our trained model to predict the prices for the houses in the test set. We only give the model the sizes from the test set (1600 sq ft and 2200 sq ft) and see what prices it predicts.
Calculate Performance Metrics: We compare the model's predictions (370k,490k) with the actual prices in the test set (320k,450k).
Errors:
House 1: Error = Predicted - Actual = 370−320=50
House 2: Error = Predicted - Actual = 490−450=40
MAE (Mean Absolute Error): Average of the absolute errors.
MAE=2∣50∣+∣40∣=250+40=290=45
MSE (Mean Squared Error): Average of the squared errors.
MSE=2502+402=22500+1600=24100=2050
RMSE (Root Mean Squared Error): Square root of MSE.
RMSE=2050≈45.28
R-squared (R2): Measures the proportion of variance explained. Requires comparing the model's errors to the variance of the actual target values in the test set.
Mean of actual test prices: (320+450)/2=385
Total Sum of Squares (SST): Sum of squared differences from the mean actual price.
SST=(320−385)2+(450−385)2=(−65)2+(65)2=4225+4225=8450
Residual Sum of Squares (SSE): Sum of squared errors (already calculated for MSE numerator).
SSE=502+402=2500+1600=4100
The MAE is 45. This means, on average, our model's price predictions on the test set were off by $45,000.
The RMSE is approximately 45.28. This also gives an idea of the typical error magnitude in the original units ($ thousands), penalizing larger errors slightly more than MAE does. Here, it's quite close to the MAE because the errors (40 and 50) are similar.
The R2 is 0.515. This suggests that our simple model (based only on size) explains about 51.5% of the variation in house prices within our small test set. The remaining 48.5% is unexplained by this model (perhaps due to factors like location, number of bedrooms, age, etc., or model inaccuracies).
Comparison of predicted prices versus actual prices for the two houses in the test set. Points on the dashed line represent perfect predictions. Our model predicted higher than the actual prices for both test houses.
"This example, though simplified with a tiny dataset and a model, demonstrates the fundamental steps: split data, train on one part, predict on the other, and calculate metrics to assess performance on unseen data. This structured process helps you understand how well your model is likely to perform when faced with new examples."
Was this section helpful?
An Introduction to Statistical Learning: With Applications in Python, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor, 2023 (Springer) - A highly regarded introductory textbook that covers fundamental concepts of statistical learning, including linear regression, training-test splitting, and evaluation metrics for regression problems.
Regression metrics, scikit-learn developers, 2024 - Official documentation detailing the definitions and implementation of various regression performance metrics, such as MAE, MSE, and R-squared, essential for model evaluation.
Applied Predictive Modeling, Max Kuhn and Kjell Johnson, 2013 (Springer)DOI: 10.1007/978-1-4614-6849-3 - This book provides practical guidance on the entire process of building and evaluating predictive models, with specific attention to robust data splitting techniques and the appropriate use of performance metrics.