Time series models like ARIMA and SARIMA are constructed to understand patterns such as trend, seasonality, and autocorrelation by fitting them to historical data. These models can be powerful tools for forecasting. However, merely building a model doesn't guarantee its usefulness. Determining if a complex SARIMA(1,1,1)(1,1,0,12) model performs better at predicting next month's sales than a simpler ARIMA(2,1,0) model, or even a naive forecast, requires careful assessment. This is where model evaluation becomes essential.
Think about it: the ultimate goal is usually to predict future values, values the model hasn't seen during its training process. A model might capture the historical data extremely well, fitting the training points almost perfectly, but fail miserably when asked to extrapolate into the future. This phenomenon, known as overfitting, occurs when a model learns the noise and specific quirks of the training data rather than the underlying signal. Evaluating a model only on the data it was trained on can be misleadingly optimistic.
Therefore, we need a structured way to assess how well our models are likely to perform on new, unseen data. Model evaluation serves several critical purposes:
Without a formal evaluation process, selecting and deploying a forecasting model becomes guesswork. You might choose a complex model that performs worse than a simple one, or deploy a model that is unreliable for making future predictions. The subsequent sections in this chapter will equip you with the standard techniques and metrics used in time series forecasting to perform this essential evaluation, starting with how to properly split your data and then calculating metrics like MAE, RMSE, and using criteria like AIC to guide your model selection process.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with