All Courses

Fitting SARIMA Models in Python

Having identified potential non-seasonal orders $(p, d, q)$ and seasonal orders $(P, D, Q)_m$ , the next step is to estimate the parameters of the SARIMA model using your time series data. Python's statsmodels library provides a convenient and powerful implementation for this purpose.

The primary tool we'll use is the SARIMAX class located within statsmodels.tsa.statespace.sarimax. This class is designed to handle models with seasonality (Seasonal), autoregressive components (AR), integration (I), moving average components (MA), and even exogenous regressors (X - though we won't focus on the 'X' part in this section).

To fit a SARIMA model, you first instantiate the SARIMAX class, providing your time series data and the chosen model orders.

The main arguments you'll use are:

endog: This is your time series data, typically a Pandas Series or a NumPy array. endog stands for endogenous variable, meaning the variable you are trying to model and forecast.
order: A tuple representing the non-seasonal order $(p, d, q)$ .
seasonal_order: A tuple representing the seasonal order $(P, D, Q, m)$ , where $m$ is the number of time steps in a single seasonal period (e.g., 12 for monthly data with an annual cycle, 4 for quarterly data).

Let's assume you have determined the appropriate orders for your data. For instance, suppose you're working with monthly data ( $m=12$ ) and decided on a $SARIMA(1, 1, 1)(1, 1, 0)_{12}$ model. Here's how you would instantiate and fit it in Python:

import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Assume 'series_train' is your training time series data (Pandas Series)
# Example: Monthly data, so m = 12
# Let's assume these orders were identified:
p, d, q = 1, 1, 1       # Non-seasonal order
P, D, Q, m = 1, 1, 0, 12 # Seasonal order

# 1. Instantiate the SARIMAX model
model = SARIMAX(series_train,
                order=(p, d, q),
                seasonal_order=(P, D, Q, m),
                enforce_stationarity=False, # Often set to False when differencing (d>0 or D>0)
                enforce_invertibility=False) # Often set to False when differencing (d>0 or D>0)

# 2. Fit the model to the data
# This step estimates the model parameters using Maximum Likelihood Estimation (MLE)
model_fit = model.fit(disp=False) # disp=False hides convergence messages

# 3. Print the model summary
print(model_fit.summary())

The enforce_stationarity and enforce_invertibility arguments relate to the properties of the AR and MA components respectively. While typically enforced for pure ARMA models, when differencing is involved ( $d>0$ or $D>0$ ), setting them to False can sometimes prevent estimation issues, as the differencing itself handles the non-stationarity.

The .fit() method performs the core estimation process. It uses numerical optimization techniques, typically Maximum Likelihood Estimation (MLE), to find the coefficient values for the specified AR, MA, seasonal AR, and seasonal MA terms that best match the observed data. This can be computationally intensive for complex models or large datasets. Setting disp=False suppresses the convergence output messages from the optimizer.

Once the model is fitted, the model_fit object contains the results, including the estimated parameters, standard errors, goodness-of-fit statistics, and more. The .summary() method provides a comprehensive overview, formatted nicely for inspection.

A typical summary output looks something like this (values are illustrative):

                                     SARIMAX Results
==========================================================================================
Dep. Variable:                              y   No. Observations:                  132
Model:             SARIMAX(1, 1, 1)x(1, 1, 0, 12)   Log Likelihood                -150.450
Date:                            Thu, 15 Aug 2024   AIC                            308.900
Time:                                    10:30:00   BIC                            319.500
Sample:                                01-01-2010   HQIC                           313.200
                                     - 12-01-2020
Covariance Type:                              opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.3500      0.080      4.375      0.000       0.193       0.507
ma.L1         -0.8500      0.060    -14.167      0.000      -0.968      -0.732
ar.S.L12       0.1500      0.090      1.667      0.096      -0.026       0.326
sigma2         0.9500      0.150      6.333      0.000       0.656       1.244
===================================================================================
Ljung-Box (L1) (Q):                   0.02   Jarque-Bera (JB):                 2.50
Prob(Q):                              0.88   Prob(JB):                         0.29
Heteroskedasticity (H):               1.10   Skew:                             0.20
Prob(H) (two-sided):                  0.75   Kurtosis:                         3.50
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

In this summary:

coef: Shows the estimated values for the parameters (non-seasonal AR(1) term ar.L1, non-seasonal MA(1) term ma.L1, seasonal AR(12) term ar.S.L12).
std err: The standard error of the coefficient estimates.
z: The z-statistic (coef / std err).
P>|z|: The p-value associated with the z-statistic. Lower p-values (typically < 0.05) suggest the coefficient is statistically significant.
[0.025 0.975]: The 95% confidence interval for the coefficient.
sigma2: The estimated variance of the residual errors.
Log Likelihood, AIC, BIC, HQIC: Information criteria used for model comparison (discussed further in Chapter 6).
The bottom table provides results from diagnostic tests on the residuals (which we will cover next).

Fitting the model is a fundamental step. After obtaining model_fit, you can proceed to diagnose the model's adequacy by examining its residuals and then use it to generate forecasts for future time points.

Was this section helpful?