Now that we understand the concepts of Autocorrelation (ACF) and Partial Autocorrelation (PACF), let's see how to generate and visualize these functions in Python. These plots are fundamental tools for visually inspecting the correlation structure of your time series data, which is a significant step towards identifying potential model parameters. The primary library we'll use for this is statsmodels
.
Remember, ACF and PACF analysis is typically performed on stationary time series data. If your data exhibits trends or seasonality, you should apply transformations like differencing (as discussed in Chapter 2) to achieve stationarity before generating these plots. Operating on non-stationary data can lead to misleading ACF/PACF plots where the underlying correlations are obscured by the trend or seasonality.
The statsmodels
library provides a convenient function, plot_acf
, within its graphics.tsaplots
module to compute and plot the ACF.
Let's assume you have a stationary time series stored in a Pandas Series object called stationary_series
. Here's how you can generate the ACF plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.arima_process import ArmaProcess # To generate sample data
# --- Generate Sample Stationary Data (e.g., AR(1) process) ---
# This simulates data you might have after differencing
np.random.seed(42)
ar_coeffs = np.array([1, -0.7]) # AR parameter = 0.7
ma_coeffs = np.array([1]) # No MA part
ar_process = ArmaProcess(ar_coeffs, ma_coeffs)
sample_data = ar_process.generate_sample(nsample=500)
stationary_series = pd.Series(sample_data)
# ------------------------------------------------------------
# Plot the ACF
fig, ax = plt.subplots(figsize=(10, 5))
plot_acf(stationary_series, lags=20, ax=ax) # Plot first 20 lags
ax.set_xlabel("Lag")
ax.set_ylabel("Autocorrelation")
ax.set_title("Autocorrelation Function (ACF)")
plt.show() # Display the plot
This code snippet first generates some sample stationary data (simulating an AR(1) process for demonstration). The core part is the call to plot_acf(stationary_series, lags=20, ax=ax)
.
lags
parameter specifies how many lags of autocorrelation to calculate and display. Choosing an appropriate number depends on your data's frequency and expected correlation length, but 20-40 lags is often a reasonable starting point.Axes
object (ax
) for better control over the plot appearance, though plot_acf
can generate its own figure if needed.The resulting plot typically looks something like this:
The ACF plot shows the correlation coefficient on the y-axis for different time lags on the x-axis. The shaded blue area represents the confidence interval (typically 95%). Correlations extending beyond this band are considered statistically significant. Note that the ACF at lag 0 is always 1, as any series is perfectly correlated with itself. This example shows a typical pattern for an AR process: correlations decay exponentially towards zero.
Similarly, statsmodels
provides the plot_pacf
function to compute and plot the PACF. Remember, the PACF measures the correlation between the series and its lag k, after removing the linear effects of the intermediate lags (1,2,...,k−1).
The usage is analogous to plot_acf
:
# Assuming 'stationary_series' holds your stationary data
from statsmodels.graphics.tsaplots import plot_pacf
# Plot the PACF
fig, ax = plt.subplots(figsize=(10, 5))
plot_pacf(stationary_series, lags=20, ax=ax, method='ywm') # Plot first 20 lags
ax.set_xlabel("Lag")
ax.set_ylabel("Partial Autocorrelation")
ax.set_title("Partial Autocorrelation Function (PACF)")
plt.show() # Display the plot
We use stationary_series
again. The lags
parameter works the same way. The method
parameter specifies how the PACF is estimated; 'ywm' (Yule-Walker Modified) is a common and often default method.
The resulting plot provides insight into the direct relationship between an observation and its lag, removing indirect correlations:
The PACF plot structure is similar to the ACF plot, with lags on the x-axis, partial correlation on the y-axis, and a confidence interval band. Again, values outside the band are statistically significant. This example PACF plot shows a sharp cutoff after lag 1, which is characteristic of an AR(1) process. All partial autocorrelations for lags greater than 1 are close to zero and within the confidence band.
Generating these plots is the first step. The real value comes from interpreting the patterns of significant spikes:
The plots generated above, based on simulated AR(1) data (yt=0.7yt−1+ϵt), clearly show this AR pattern: the ACF decays exponentially, while the PACF has a significant spike only at lag 1 and then cuts off (falls within the confidence bands).
By generating and examining these plots for your stationary time series, you gain valuable clues about the underlying data generating process. This visual inspection, combined with quantitative measures discussed later, helps guide the selection of appropriate p and q orders for ARIMA models, which we will cover in the next chapter. The next section details how to use these patterns for model identification more systematically.
© 2025 ApX Machine Learning