In the previous chapter, you learned how to build Autoregressive Integrated Moving Average (ARIMA) models, denoted as ARIMA(p,d,q). These models are powerful tools for capturing serial correlation and trends in time series data. They work by relating the current value of a series to its own past values (the AR part) and to past forecast errors (the MA part), often after applying differencing to achieve stationarity (the I part).
However, many real-world time series exhibit seasonality, a pattern that repeats over a fixed, known period. Think of monthly retail sales data often peaking before holidays, quarterly company earnings reports, or daily website traffic showing weekday/weekend variations. These cycles occur at regular intervals (e.g., every 12 months, 4 quarters, or 7 days).
Standard ARIMA(p,d,q) models face challenges when dealing with strong seasonality. Here's why:
Consider a time series representing monthly sales of air conditioners, which likely peaks in the summer months each year.
Illustrative monthly sales data showing a clear yearly seasonal pattern. Sales peak in months 6-8 (summer) each year.
An ARIMA model trying to forecast sales for next July would primarily look at sales in June, May, April, etc. (lags 1, 2, 3...). While these recent values provide some information, the sales figure from the previous July (lag 12) is often a much stronger predictor due to the seasonal nature of demand. A standard ARIMA model doesn't explicitly use this lag-12 information unless p or q is set to 12 or higher, which, as noted, creates an inefficient model structure.
Furthermore, if we look at the Autocorrelation Function (ACF) plot for such seasonal data (after potentially making it stationary), we often see significant correlations not just at short lags but also at lags corresponding to the seasonal frequency (e.g., 12, 24, 36 for monthly data).
Example ACF plot for data with monthly seasonality. Note the significant spikes at lag 12 and lag 24, indicating strong correlation at the seasonal frequency, in addition to shorter-lag correlations. The dashed lines represent approximate confidence bounds.
Standard ARIMA models are not structured to efficiently capture these distinct seasonal spikes in the ACF. They try to model the decaying pattern from lag 1 onwards but don't have specific terms dedicated to lags m,2m,3m, etc.
Because of these limitations, relying solely on ARIMA(p,d,q) for strongly seasonal data often leads to suboptimal forecasts. The models might fail to capture the recurring peaks and troughs accurately or require an excessive number of parameters. This necessitates an extension that explicitly incorporates seasonal components, leading us to the Seasonal ARIMA (SARIMA) model.
© 2025 ApX Machine Learning