You've now learned how to construct potentially powerful time series models like ARIMA and SARIMA, fitting them to historical data to understand patterns like trend, seasonality, and autocorrelation. However, simply building a model doesn't guarantee its usefulness for forecasting. How do we know if the SARIMA(1,1,1)(1,1,0,12) model you painstakingly tuned is actually better at predicting next month's sales than a simpler ARIMA(2,1,0) model, or even a naive forecast? This is where model evaluation becomes essential.
Think about it: the ultimate goal is usually to predict future values, values the model hasn't seen during its training process. A model might capture the historical data extremely well, fitting the training points almost perfectly, but fail miserably when asked to extrapolate into the future. This phenomenon, known as overfitting, occurs when a model learns the noise and specific quirks of the training data rather than the underlying signal. Evaluating a model only on the data it was trained on can be misleadingly optimistic.
Therefore, we need a structured way to assess how well our models are likely to perform on new, unseen data. Model evaluation serves several critical purposes:
Without a formal evaluation process, selecting and deploying a forecasting model becomes guesswork. You might choose a complex model that performs worse than a simple one, or deploy a model that is unreliable for making future predictions. The subsequent sections in this chapter will equip you with the standard techniques and metrics used in time series forecasting to perform this essential evaluation, starting with how to properly split your data and then calculating metrics like MAE, RMSE, and using criteria like AIC to guide your model selection process.
© 2025 ApX Machine Learning