Okay, you've prepared your time series data, made it stationary by differencing (determining the d parameter), and used ACF/PACF plots to get initial estimates for the autoregressive (p) and moving average (q) orders. Now it's time to put these pieces together and estimate the parameters of your chosen ARIMA(p,d,q) model using Python's statsmodels
library.
The primary tool for this in statsmodels
is the ARIMA
class located in the statsmodels.tsa.arima.model
module. This implementation is based on state space methods, offering a flexible and efficient way to handle ARIMA modeling.
First, ensure you have the necessary libraries imported and your data loaded into a pandas Series with a DatetimeIndex. While statsmodels
can sometimes work with simple numerical indices, using a DatetimeIndex is best practice for time series analysis and essential for interpreting results and forecasting effectively.
import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
import statsmodels.api as sm # Often useful for datasets or other tools
# Assume 'series_original' is your pandas Series containing the time series data
# It should have a DatetimeIndex. For example:
# rng = pd.date_range('2020-01-01', periods=100, freq='D')
# series_original = pd.Series(np.random.randn(100).cumsum() + 50, index=rng)
# Let's assume based on previous analysis (stationarity tests, ACF/PACF)
# we decided on an ARIMA(1, 1, 1) model.
p = 1
d = 1
q = 1
To fit an ARIMA model, you first create an instance of the ARIMA
class, passing your time series data (endog
for endogenous variable) and the chosen order (p,d,q). The order
parameter takes a tuple containing these three integers.
# 1. Instantiate the ARIMA model
# Provide the original series and the order (p, d, q)
# The model internally handles the differencing based on d
model = ARIMA(series_original, order=(p, d, q))
# 2. Fit the model
# This step performs the parameter estimation, typically using
# Maximum Likelihood Estimation (MLE).
results = model.fit()
The fit()
method does the heavy lifting. It estimates the AR coefficients (ϕ1,...,ϕp), the MA coefficients (θ1,...,θq), and the variance of the error term (σ2). It uses numerical optimization techniques to find the parameter values that maximize the likelihood of observing your actual data given the model structure.
Once the model is fitted, the results
object (often an instance of ARIMAResultsWrapper
) contains a wealth of information about the estimated model. The most convenient way to see the main findings is using the summary()
method.
# 3. Print the summary of the fitted model
print(results.summary())
The output of summary()
typically includes:
ar.L1
for the first AR lag coefficient ϕ1, ma.L1
for the first MA lag coefficient θ1, sigma2
for the error variance). Alongside each coefficient, you'll find:
std err
: The standard error of the coefficient estimate, indicating its precision.z
: The z-statistic (coefficient divided by standard error), used for hypothesis testing.P>|z|
: The p-value associated with the z-statistic. A small p-value (typically < 0.05) suggests the coefficient is statistically significantly different from zero.[0.025 0.975]
: The 95% confidence interval for the coefficient.Here's an example interpreting a coefficient line:
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------
ar.L1 0.650 0.080 8.125 0.000 0.493 0.807
This suggests the estimated coefficient for the first AR term (ϕ^1) is 0.650. The p-value is 0.000, strongly indicating this term is statistically significant. The 95% confidence interval ranges from 0.493 to 0.807.
The results
object also allows you to access the model's fitted values on the training data. These are the one-step-ahead predictions the model would have made within the sample. Comparing these to the actual data gives a visual sense of the model's fit. Remember that if d>0, the fitted values correspond to the differenced series.
# Access fitted values
fitted_values = results.fittedvalues
# If d=1, fitted_values correspond to the differenced series
# If d=0, they correspond to the original series
Let's visualize how the fitted values compare to the actual data (after differencing, if applicable). For an ARIMA(1, 1, 1) example, we would compare the fitted values to the first-differenced series.
# Example: Assuming d=1
series_diff1 = series_original.diff().dropna()
# Generate some sample data for plotting
np.random.seed(42)
n_points = 100
rng = pd.date_range('2020-01-01', periods=n_points, freq='D')
true_ar = [0.7]
true_ma = [-0.4]
errors = np.random.normal(0, 1, n_points)
y = np.zeros(n_points)
for t in range(1, n_points):
y[t] = true_ar[0] * y[t-1] + errors[t] + true_ma[0] * errors[t-1]
series_arma = pd.Series(y, index=rng) # This is stationary ARMA(1,1)
# To make it ARIMA(1,1,1), let's integrate it
series_original_example = series_arma.cumsum() + 50
series_original_example = series_original_example.asfreq('D') # Ensure frequency
# Fit ARIMA(1,1,1) to this example data
model_example = ARIMA(series_original_example, order=(1, 1, 1))
results_example = model_example.fit()
fitted_vals_example = results_example.fittedvalues
# Get the actual differenced series for comparison
actual_diff_example = series_original_example.diff().dropna()
# Ensure indices align for plotting (fitted values might miss first d points)
comparison_df = pd.DataFrame({
'Actual Differenced': actual_diff_example,
'Fitted Values': fitted_vals_example
}).dropna()
A comparison between the actual first-differenced values and the one-step-ahead fitted values from the ARIMA(1,1,1) model for a short segment of the example data.
This plot helps visually assess how well the model captures the dynamics of the (differenced) series within the training period. Significant deviations might suggest model misspecification.
Fitting the model is a central step, but it's not the end. You've now estimated the parameters based on your chosen order. The next steps involve carefully diagnosing the model fit by analyzing the residuals and then using the validated model to generate forecasts.
© 2025 ApX Machine Learning