In the previous chapter, we discussed how to decompose time series and ensure they are stationary. Now, we turn our attention to understanding the correlation structure within a stationary time series. Specifically, how is the value of the series at a given time t, denoted as yt, related to its past values like yt−1, yt−2, and so on? The Autocorrelation Function (ACF) is our primary tool for quantifying this relationship.
Autocorrelation simply means "self-correlation". It measures the linear relationship between lagged values of a time series. In simpler terms, it tells us how much the value of the series at time t is correlated with its value k periods ago, at time t−k. This lag k can be 1, 2, 3, etc.
The ACF value for a specific lag k, often denoted as ρk, is calculated similarly to the standard correlation coefficient but between yt and yt−k across all available t.
ρk=Var(yt)Cov(yt,yt−k)Where Cov(yt,yt−k) is the covariance between the series and its lagged version, and Var(yt) is the variance of the series. Since we assume the series is stationary, the variance Var(yt) is constant over time, and the covariance Cov(yt,yt−k) depends only on the lag k, not on the specific time t.
The ACF values range from -1 to 1:
By definition, the autocorrelation at lag 0, ρ0, is always 1, as any series is perfectly correlated with itself at no lag.
We rarely calculate these values manually. Statistical libraries like statsmodels
in Python provide functions to compute and plot the ACF. The standard visualization is a "correlogram" or ACF plot. This plot shows the autocorrelation values ρk on the y-axis for different lags k on the x-axis (usually starting from lag 1, though sometimes lag 0 is included).
A critical feature of ACF plots generated by statistical packages is the inclusion of significance boundaries. These are typically represented as a shaded area (often blue). Lags where the autocorrelation bar extends beyond this boundary are considered statistically significant (usually at the 5% significance level). This suggests that the observed correlation at that lag is unlikely to be due to random chance alone, assuming the true autocorrelation at that lag is zero.
Let's look at an example ACF plot for a hypothetical stationary time series:
ACF plot showing autocorrelation values for lags 1 through 20. The blue shaded area represents the 95% confidence interval. Bars extending beyond this area indicate statistically significant autocorrelation.
In the plot above:
Analyzing the ACF plot is a fundamental step in time series analysis. It helps us understand the "memory" of the process. How far back in time do past values significantly influence the current value? The pattern of decay (e.g., sharp cutoff vs. slow decay) in the ACF provides hints about the underlying structure of the data and guides us in selecting appropriate models like Moving Average (MA) or Autoregressive (AR) models. We will explore this connection between ACF patterns and model identification in the section "Interpreting ACF/PACF for Model Selection". For now, focus on understanding what the ACF measures and how to read its plot.
© 2025 ApX Machine Learning