After fitting a SARIMA model, the process isn't complete. Just like with the simpler ARIMA models, it's important to perform diagnostic checks to assess whether the chosen model provides an adequate fit to the data. This step helps determine if the model's assumptions are reasonably met and if it has captured the essential patterns, including the seasonality, present in the time series. Failure to diagnose the model can lead to unreliable forecasts and misguided conclusions.
The primary tool for diagnosing time series models, including SARIMA, is the analysis of the model's residuals. Residuals are the differences between the actual observed values and the values predicted by the model within the training data:
Residualt=Actualt−Predictedt
If the SARIMA model is a good fit, the residuals should ideally resemble white noise. This means the residuals should have:
Let's look at the common diagnostic techniques.
A simple first step is to plot the residuals against time. This plot helps visually inspect for any remaining patterns, outliers, non-constant variance, or trends that the model failed to capture. Ideally, the plot should show points randomly scattered around zero with no discernible structure.
Residuals plotted over time. Ideally, they should fluctuate randomly around the zero line without showing obvious patterns or trends.
If you observe clear patterns, like lingering seasonality or a trend, it suggests the model order (non-seasonal or seasonal) might need adjustment. If the variance of the residuals changes significantly over time (heteroscedasticity), data transformations (like logarithmic or Box-Cox) might be considered before modeling, or more advanced models might be needed.
While not strictly required for the model coefficients to be meaningful, the assumption that residuals are normally distributed is often important for constructing accurate prediction intervals. Two common plots used to assess normality are the histogram and the Quantile-Quantile (QQ) plot.
Histogram (left) and QQ-Plot (right) of model residuals. The histogram approximates a bell curve, and points in the QQ-plot lie close to the diagonal line, suggesting the residuals are reasonably close to a normal distribution.
Deviations from normality, especially heavy tails (points deviating substantially from the line at the ends in the QQ-plot), might indicate that prediction intervals could be inaccurate.
Perhaps the most critical diagnostic for time series models is checking for autocorrelation in the residuals. If the model has successfully captured the temporal dependencies (both non-seasonal and seasonal), the residuals should be uncorrelated. We use the ACF plot of the residuals for this check.
Significant spikes in the ACF plot of the residuals, particularly at lower lags (like 1, 2, 3...) or at seasonal lags (m,2m,3m,…, where m is the seasonal period), indicate that the model hasn't fully captured the correlation structure.
ACF plot of residuals. Most spikes are within the confidence bounds (dashed lines), suggesting no significant autocorrelation remains. A spike significantly outside the bounds would indicate a potential issue. For a SARIMA model with monthly seasonality (m=12), pay close attention to lags 12 and 24.
If significant autocorrelation is found:
Besides visual inspection, formal statistical tests can check for autocorrelation in the residuals. The Ljung-Box test is commonly used. It tests the null hypothesis that the first k autocorrelation coefficients of the residuals are jointly zero.
H0: The residuals are independently distributed (no autocorrelation). Ha: The residuals exhibit autocorrelation.
The test produces a statistic and a p-value. If the p-value is small (typically less than a significance level like 0.05), we reject the null hypothesis and conclude that there is significant autocorrelation remaining in the residuals, indicating the model might be misspecified.
In Python's statsmodels
, the summary()
method of a fitted SARIMA model often includes the Ljung-Box test results (often reported as "Prob(Q)"). Look for p-values greater than 0.05 to support the claim that residuals are independent.
The summary()
output provided by libraries like statsmodels
after fitting a SARIMA model is a rich source of diagnostic information. Beyond the Ljung-Box test, it typically includes:
Carefully reviewing this summary table provides a quantitative complement to the visual diagnostics.
By systematically applying these diagnostic checks - examining residual plots, testing for normality, checking the ACF/PACF of residuals, and interpreting formal tests like Ljung-Box - you can gain confidence in your fitted SARIMA model. If the diagnostics reveal problems, revisit the model identification and order selection steps (Chapters 3 and 5) to refine the model structure until the residuals approximate white noise. Only after thorough diagnostics should you proceed to use the model for forecasting.
© 2025 ApX Machine Learning