As discussed previously, while ARIMA models are powerful tools for modeling time series, they have limitations when faced with data exhibiting strong, repeating seasonal patterns. Trying to capture seasonality solely through non-seasonal AR or MA terms often requires very high orders, leading to overly complex models that might not accurately reflect the underlying seasonal structure.
To address this, we introduce the Seasonal Autoregressive Integrated Moving Average (SARIMA) model. SARIMA extends the basic ARIMA framework by explicitly incorporating seasonal components into the model structure. This makes it particularly well-suited for time series data where patterns repeat over a fixed period, such as monthly sales figures showing yearly peaks or daily website traffic exhibiting weekly cycles.
Understanding SARIMA Notation
A SARIMA model is typically denoted as SARIMA(p,d,q)(P,D,Q)m. Let's break down this notation:
- (p,d,q): These are the non-seasonal parameters, exactly the same as in the standard ARIMA model you learned about in the previous chapter.
- p: Non-seasonal Autoregressive (AR) order.
- d: Non-seasonal Differencing order.
- q: Non-seasonal Moving Average (MA) order.
- (P,D,Q): These represent the seasonal components of the model. They are analogous to their non-seasonal counterparts but operate at the seasonal lag.
- P: Seasonal Autoregressive (AR) order. It captures the relationship between the current observation and observations from previous seasons.
- D: Seasonal Differencing order. It accounts for seasonal trends by subtracting observations separated by one full season.
- Q: Seasonal Moving Average (MA) order. It models the relationship between the current error and errors from previous seasons.
- m: This is a critical parameter representing the seasonal period or frequency. It's the number of time steps in one full seasonal cycle. For example:
- m=12 for monthly data with an annual seasonality.
- m=4 for quarterly data with an annual seasonality.
- m=7 for daily data with a weekly seasonality.
- m=52 for weekly data with an annual seasonality.
How SARIMA Works Conceptually
Think of a SARIMA model as combining two processes: one that models the non-seasonal dynamics and another that models the seasonal dynamics.
- Seasonal Differencing: If D>0, the model first applies seasonal differencing (yt−yt−m) to remove or reduce the seasonal trend. This helps stabilize the seasonal component of the series.
- Non-Seasonal Differencing: If d>0, the model then applies non-seasonal differencing (yt′−yt−1′, where y′ is the seasonally differenced series if D>0, otherwise y′=y) to handle non-seasonal trends and make the series stationary in the usual sense.
- ARMA Modeling: Finally, an ARMA-like structure is applied to the differenced series. This structure includes both:
- Non-seasonal AR (p) and MA (q) terms that capture correlations at short lags (e.g., lag 1, lag 2).
- Seasonal AR (P) and MA (Q) terms that capture correlations at seasonal lags (e.g., lag m, lag 2m).
For instance, a SARIMA(1,1,1)(1,1,1)12 model for monthly data (m=12) suggests:
- Non-seasonal: An AR(1) term, one order of regular differencing (d=1), and an MA(1) term.
- Seasonal: A seasonal AR(1) term (correlation with lag 12), one order of seasonal differencing (D=1, yt−yt−12), and a seasonal MA(1) term (correlation between errors at lag 12).
By incorporating these distinct seasonal parameters (P,D,Q)m, SARIMA provides a more structured and interpretable way to model time series that are influenced by predictable cyclical patterns. The following sections will guide you through identifying appropriate seasonal orders using ACF/PACF plots and implementing these models in Python using the statsmodels
library.