Let's put the theory from the previous sections into practice. We'll generate Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots for a time series and analyze them to suggest potential model orders. This step is fundamental before fitting ARIMA models, as these plots provide clues about the underlying structure (the 'p' and 'q' parameters) of a stationary time series.
We assume you have a stationary time series ready. If you started with non-stationary data, you should have already applied transformations like differencing (as discussed in Chapter 2) to achieve stationarity. For this exercise, we'll work with a synthetically generated stationary dataset to clearly illustrate the expected patterns.
You'll need the following Python libraries:
numpy
for numerical operations and generating sample data.pandas
for data handling (though less critical for this specific plotting example if working directly with NumPy arrays).statsmodels
for ACF/PACF calculations and plotting functions.plotly.graph_objects
for creating the plots as requested (we will manually construct these based on statsmodels
calculations).import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.stattools import acf, pacf
# We will use plotly for visualization as requested
# Note: statsmodels has its own plot_acf/plot_pacf based on matplotlib,
# but we'll extract data to create Plotly charts.
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Set seed for reproducibility
np.random.seed(42)
# Generate a sample stationary AR(2) process:
# y_t = 0.7*y_{t-1} - 0.3*y_{t-2} + noise
ar_params = np.array([0.7, -0.3])
ma_params = np.array([]) # No MA component
ar = np.r_[1, -ar_params] # add zero-lag coeff
ma = np.r_[1, ma_params] # add zero-lag coeff
# Generate 500 data points
n_samples = 500
# Use ArmaProcess to generate data (part of statsmodels)
from statsmodels.tsa.arima_process import ArmaProcess
ar_process = ArmaProcess(ar, ma)
sample_data = ar_process.generate_sample(nsample=n_samples)
# Convert to pandas Series (optional, but common practice)
ts = pd.Series(sample_data)
print("Generated Sample Data (first 5 values):")
print(ts.head())
print(f"\nIs the generated data stationary (based on generation process)? Yes, AR(2) process parameters are within the stationary region.")
# You would typically run an ADF test here on real data
# from statsmodels.tsa.stattools import adfuller
# adf_result = adfuller(ts)
# print(f'ADF Statistic: {adf_result[0]}')
# print(f'p-value: {adf_result[1]}') # A low p-value suggests stationarity
The ACF measures the correlation between the time series yt and its lagged values yt−k for different values of lag k. We use the acf
function from statsmodels.tsa.stattools
to compute these values and the corresponding confidence intervals.
A significant spike at lag k indicates a strong correlation between observations k periods apart. For an MA(q) process, the ACF plot is expected to have significant spikes up to lag q and then abruptly cut off (fall within the confidence interval). For an AR(p) process, the ACF typically decays more slowly (often geometrically or following a sine wave pattern).
# Calculate ACF and confidence intervals
# nlags specifies how many lags to calculate; alpha specifies the confidence level (0.05 for 95%)
acf_values, confint = acf(ts, nlags=20, alpha=0.05)
# The confidence interval array confint has shape (nlags+1, 2)
# Lower bound = confint[:, 0] - acf_values
# Upper bound = confint[:, 1] - acf_values
# Note: acf_values[0] is always 1 (correlation with lag 0)
lags = np.arange(len(acf_values))
conf_lower = confint[:, 0] - acf_values
conf_upper = confint[:, 1] - acf_values
# Create Plotly figure for ACF
fig_acf = go.Figure()
# Add confidence interval band (excluding lag 0)
fig_acf.add_trace(go.Scatter(
x=np.concatenate([lags[1:], lags[1:][::-1]]), # x-coordinates for polygon shape
y=np.concatenate([conf_upper[1:], conf_lower[1:][::-1]]), # y-coordinates for polygon shape
fill='toself',
fillcolor='#a5d8ff', # Light blue
line=dict(color='rgba(255,255,255,0)'), # No border line
hoverinfo="skip",
showlegend=False,
name='Confidence Interval'
))
# Add ACF bars/stems (excluding lag 0)
fig_acf.add_trace(go.Scatter(
x=lags[1:],
y=acf_values[1:],
mode='markers',
marker=dict(color='#1c7ed6', size=8), # Blue dots
name='ACF'
))
# Add vertical lines from stems to x-axis
for i in range(1, len(acf_values)):
fig_acf.add_shape(type='line',
x0=lags[i], y0=0,
x1=lags[i], y1=acf_values[i],
line=dict(color='#495057', width=1.5)) # Gray lines
# Add lag 0 point (always 1)
fig_acf.add_trace(go.Scatter(
x=[lags[0]], y=[acf_values[0]], mode='markers', marker=dict(color='#1c7ed6', size=8), showlegend=False
))
fig_acf.add_shape(type='line', x0=lags[0], y0=0, x1=lags[0], y1=acf_values[0], line=dict(color='#495057', width=1.5))
# Update layout
fig_acf.update_layout(
title='Autocorrelation Function (ACF)',
xaxis_title='Lag',
yaxis_title='Autocorrelation',
yaxis_range=[-1, 1.1], # Ensure y-axis covers full range plus lag 0
xaxis=dict(tickmode='linear', dtick=1), # Show integer lags
plot_bgcolor='white',
height=350,
margin=dict(l=50, r=20, t=50, b=40)
)
# Display the plot (in a notebook environment) or save it
# fig_acf.show() # Uncomment to display interactively
ACF plot for the generated AR(2) data. The blue shaded area represents the 95% confidence interval. Correlations extending beyond this area are statistically significant.
The PACF measures the correlation between yt and yt−k after removing the effects of the intermediate lags (yt−1,yt−2,...,yt−k+1). We use the pacf
function from statsmodels.tsa.stattools
.
For an AR(p) process, the PACF plot is expected to have significant spikes up to lag p and then abruptly cut off. This is because the PACF removes the influence of shorter lags, isolating the direct relationship described by the AR parameters. For an MA(q) process, the PACF typically decays more slowly.
# Calculate PACF and confidence intervals
# method='ywm' is the default and generally recommended
pacf_values, confint_pacf = pacf(ts, nlags=20, alpha=0.05, method='ywm')
# Extract confidence intervals similarly to ACF
lags_pacf = np.arange(len(pacf_values))
conf_lower_pacf = confint_pacf[:, 0] - pacf_values
conf_upper_pacf = confint_pacf[:, 1] - pacf_values
# Create Plotly figure for PACF
fig_pacf = go.Figure()
# Add confidence interval band (excluding lag 0)
fig_pacf.add_trace(go.Scatter(
x=np.concatenate([lags_pacf[1:], lags_pacf[1:][::-1]]),
y=np.concatenate([conf_upper_pacf[1:], conf_lower_pacf[1:][::-1]]),
fill='toself',
fillcolor='#a5d8ff', # Light blue
line=dict(color='rgba(255,255,255,0)'),
hoverinfo="skip",
showlegend=False,
name='Confidence Interval'
))
# Add PACF bars/stems (excluding lag 0)
fig_pacf.add_trace(go.Scatter(
x=lags_pacf[1:],
y=pacf_values[1:],
mode='markers',
marker=dict(color='#7048e8', size=8), # Violet dots
name='PACF'
))
# Add vertical lines from stems to x-axis
for i in range(1, len(pacf_values)):
fig_pacf.add_shape(type='line',
x0=lags_pacf[i], y0=0,
x1=lags_pacf[i], y1=pacf_values[i],
line=dict(color='#495057', width=1.5)) # Gray lines
# Add lag 0 point (always 1 for PACF definition, though sometimes omitted in plots)
# We'll omit the line/marker at lag 0 for PACF as it's less standard than for ACF
# fig_pacf.add_trace(go.Scatter(
# x=[lags_pacf[0]], y=[pacf_values[0]], mode='markers', marker=dict(color='#7048e8', size=8), showlegend=False
# ))
# Update layout
fig_pacf.update_layout(
title='Partial Autocorrelation Function (PACF)',
xaxis_title='Lag',
yaxis_title='Partial Autocorrelation',
yaxis_range=[-1, 1.1], # Ensure y-axis covers full range
xaxis=dict(tickmode='linear', dtick=1), # Show integer lags
plot_bgcolor='white',
height=350,
margin=dict(l=50, r=20, t=50, b=40)
)
# fig_pacf.show() # Uncomment to display interactively
PACF plot for the generated AR(2) data. The blue shaded area represents the 95% confidence interval.
Now, let's interpret the plots generated from our sample AR(2) data:
ACF Plot Analysis:
PACF Plot Analysis:
Conclusion:
Here's a quick reference guide based on ACF/PACF patterns for stationary data:
Process | ACF Pattern | PACF Pattern | Suggested Model |
---|---|---|---|
AR(p) | Tails off (geometric/sine wave) | Cuts off after lag p | ARIMA(p, 0, 0) |
MA(q) | Cuts off after lag q | Tails off (geometric/sine wave) | ARIMA(0, 0, q) |
ARMA(p,q) | Tails off | Tails off | ARIMA(p, 0, q) |
(Remember, the 'd' in ARIMA(p, d, q) relates to the differencing needed for stationarity, which is determined before analyzing ACF/PACF plots).
Now it's time to apply this yourself.
y_t = 0.6 * noise_{t-1} + noise_t
) similar to how we generated the AR(2) data above.Interpreting ACF and PACF plots is often more art than exact science, especially with real-world noisy data. Sometimes the patterns aren't perfectly clear. However, they provide an indispensable starting point for identifying candidate models, which you will learn to fit and evaluate in the upcoming chapters.
© 2025 ApX Machine Learning