Selecting the right orders for a SARIMA model, represented as SARIMA(p,d,q)(P,D,Q)m, involves determining seven parameters. This process combines analyzing the structure of your time series, inspecting ACF and PACF plots, and often requires some iteration. Let's break down how to approach selecting each parameter.
1. Determine the Seasonal Period (m)
The seasonal period, m, represents the number of time steps in one full seasonal cycle. This parameter is typically dictated by the nature of the data and domain knowledge:
- Monthly Data: Likely has a yearly seasonality, so m=12.
- Quarterly Data: Likely has a yearly seasonality, so m=4.
- Daily Data: Might have weekly seasonality (m=7), monthly seasonality (approx. m=30, though often tricky), or yearly seasonality (m=365). Weekly is most common for standard SARIMA.
Visual inspection of the time series plot is usually the best way to confirm the suspected period. Look for repeating patterns and measure the time distance between peaks or troughs.
2. Determine the Seasonal Differencing Order (D)
Seasonal differencing addresses seasonality by subtracting observations separated by one seasonal period (m). The goal is to remove the seasonal pattern to help achieve stationarity.
- Inspect ACF at Seasonal Lags: Plot the ACF of your original time series. Look specifically at lags that are multiples of m (i.e., m,2m,3m,...). If there are significant spikes at these seasonal lags and they decay very slowly, it indicates strong seasonality suggesting seasonal differencing is needed.
- Apply Seasonal Differencing: If seasonality is strong, try setting D=1. Calculate the seasonally differenced series: yt′=yt−yt−m.
- Re-check ACF: Plot the ACF of the seasonally differenced series (yt′). Check if the significant spikes at seasonal lags have disappeared or significantly reduced.
- When is D>1? It's rare to need D>1. If the ACF of the once-differenced series still shows strong, non-decaying spikes at seasonal lags, you might consider D=2, but this is uncommon. Usually, D is either 0 (no seasonal differencing needed) or 1.
- Formal Tests: While visual inspection is common, formal tests for seasonal unit roots exist (like the Canova-Hansen test or Osborn-Chui-Smith test), but interpreting ACF/PACF is often sufficient in practice for deciding on D=1.
3. Determine the Non-Seasonal Differencing Order (d)
After addressing seasonality with D (if needed), examine the resulting series (original or seasonally differenced) for non-seasonal stationarity (i.e., check for trends or random walks).
- Inspect ACF/PACF and Series Plot: Look at the plot of the (potentially seasonally differenced) series. Does it still show a clear upward or downward trend? Examine the ACF plot again. Does it decay very slowly at the non-seasonal lags (lags 1, 2, 3...)?
- Apply Non-Seasonal Differencing: If a trend is evident or the ACF decays slowly, apply first-order non-seasonal differencing (d=1). Calculate the new series: yt′′=yt′−yt−1′ (where yt′ is the series after seasonal differencing, or the original series if D=0).
- Re-check Stationarity: Use visual inspection and the Augmented Dickey-Fuller (ADF) test on the resulting series (yt′′). If the series is now stationary (ADF test rejects the null hypothesis of a unit root), then d=1 is likely appropriate.
- When is d=2? If the series is still non-stationary after one round of differencing (d=1), try second-order differencing (d=2). This is usually needed for series with changing trends (e.g., acceleration). It's less common than d=0 or d=1. Over-differencing can introduce artificial patterns, so be cautious.
Your goal is to achieve a stationary series using the minimum necessary differencing steps (D and d).
4. Determine Seasonal AR (P) and MA (Q) Orders
Once the series is stationary after applying seasonal (D) and non-seasonal (d) differencing, inspect the ACF and PACF plots of this final differenced series to identify the seasonal AR (P) and MA (Q) orders. Focus on the lags that are multiples of m (m,2m,3m,...).
- Seasonal MA (Q): Look at the ACF plot. If there's a significant spike only at lag m and the ACF cuts off immediately after (i.e., spikes at 2m,3m,... are not significant), this suggests a Seasonal MA(1) model. Set Q=1 and P=0. If spikes occur at lags m and 2m but cut off after that, consider Q=2.
- Seasonal AR (P): Look at the PACF plot. If there's a significant spike only at lag m and the PACF cuts off immediately after (spikes at 2m,3m,... are not significant), this suggests a Seasonal AR(1) model. Set P=1 and Q=0. If spikes occur at lags m and 2m but cut off after that, consider P=2.
- Seasonal ARMA (P>0,Q>0): If both the ACF and PACF show significant spikes that tail off slowly at the seasonal lags (e.g., decay gradually at m,2m,3m,...), it might indicate a need for both seasonal AR and MA terms (e.g., P=1,Q=1).
- Common Values: Often, P and Q are 0 or 1. Higher seasonal orders are less frequent. Start simple.
Let's visualize a hypothetical ACF/PACF plot for a seasonally differenced (D=1) and non-seasonally differenced (d=1) monthly series (m=12) suggesting P=1,Q=0 and p=1,q=0.
ACF tails off at non-seasonal lags (1, 2, 3...) and seasonal lags (12, 24...). PACF cuts off after lag 1 (non-seasonal) and lag 12 (seasonal). This suggests p=1,q=0 and P=1,Q=0. Confidence intervals (blue dashed lines) help judge significance.
5. Determine Non-Seasonal AR (p) and MA (q) Orders
Simultaneously, examine the first few non-seasonal lags (lags 1, 2, 3, ...) in the same ACF and PACF plots of the differenced stationary series. Apply the standard ARIMA identification rules:
- MA (q): ACF cuts off sharply after lag q, while PACF tails off. Set p=0.
- AR (p): PACF cuts off sharply after lag p, while ACF tails off. Set q=0.
- ARMA (p>0,q>0): Both ACF and PACF tail off gradually.
Look for the simplest model that fits the patterns. Often p and q are small numbers like 0, 1, or 2.
6. Iteration and Refinement
Identifying SARIMA orders from ACF/PACF plots is often more of an art than an exact science. The plots provide suggestions for candidate models.
- Start Simple: Begin with lower order models suggested by the plots (e.g., SARIMA(1,1,1)(1,1,1)12).
- Fit and Diagnose: Fit your candidate model(s) using
statsmodels
. Examine the model summary and, most importantly, perform residual diagnostics (covered in detail later). Residuals should ideally resemble white noise (no autocorrelation, zero mean, constant variance). Check the ACF/PACF plots of the residuals. If significant spikes remain, your model may be missing some structure, suggesting adjustments to the orders (p,q,P, or Q).
- Information Criteria: Use metrics like AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to compare different candidate models that seem plausible after diagnostics. Lower AIC/BIC values generally indicate a better balance between model fit and complexity. Libraries like
pmdarima
can automate searching over a grid of potential orders using these criteria, which can be a useful supplement to manual inspection.
Selecting the optimal SARIMA order is an iterative process involving examining plots, fitting models, checking residuals, and comparing model fit statistics. The goal is to find a parsimonious model (simplest model with good performance) that captures both the non-seasonal and seasonal dynamics of your stationary time series.