Analyzing causality in temporal data requires specialized approaches. One of the earliest and most widely known concepts for analyzing relationships between time series is Granger causality. While influential, especially in econometrics, it's essential for practitioners in machine learning systems to understand its precise definition and, more importantly, its significant limitations when interpreting results as evidence of genuine causal influence.
Developed by Nobel laureate Sir Clive Granger, the core idea isn't about structural causation in the way we understand it from Structural Causal Models (SCMs). Instead, Granger causality is fundamentally a statement about predictability. A time series is said to "Granger-cause" another time series if past values of contain information that helps predict more effectively than the information already contained in the past values of itself.
Let and be two stationary time series. Consider two autoregressive models for predicting :
Restricted Model: Predicts using only its own past values (lags).
Here, is the number of lags included, are coefficients, and is the prediction error (residual) for this model.
Unrestricted Model: Predicts using past values of both and .
Here, is the number of lags for , and are coefficients, and is the prediction error for this model.
The definition of Granger causality depends on comparing the variances of the prediction errors ( and ) from these two models.
Definition: Granger-causes if the variance of the prediction error from the unrestricted model is statistically significantly smaller than the variance of the error from the restricted model. In simpler terms, adding past values significantly improves the prediction of . This is typically tested using an F-test on the joint significance of the coefficients (i.e., ).
Similarly, we can test if Granger-causes by swapping the roles of and in the models above.
While the concept of improved predictability is useful, equating Granger causality with structural causation (i.e., asserting that manipulating would lead to a change in ) is fraught with peril. For advanced practitioners building reliable systems, recognizing these limitations is absolutely necessary.
Predictability vs. Causation: This is the most fundamental limitation. Granger causality only establishes whether past helps predict future . It does not imply that exerts a physical or structural influence on . Correlation, even lagged correlation, does not equal causation.
Unobserved Confounding: The most common reason for spurious Granger causality is the presence of an unobserved common cause that influences both and with different time lags. If affects first and then , the past values of will appear predictive of simply because they act as a proxy for the influence of the unobserved . Standard Granger tests do not account for such confounders.
An unobserved common cause influencing both and at different lags can create spurious Granger causality from to .
Instantaneous Effects: Granger causality is defined based on past values. It cannot detect contemporaneous causal relationships where influences within the same time period (). These effects are absorbed into the error terms' correlation.
Omitted Variables: The test assumes that all relevant predictive information is contained within the past values of and . If a third observed variable, , influences and is also correlated with past , omitting from the models can lead to incorrect conclusions about the relationship between and . This necessitates multivariate extensions (Vector Autoregression, VAR) for practical use, but even VAR-based Granger tests suffer from confounding if relevant variables are omitted.
Non-Linear Relationships: The standard formulation relies on linear autoregressive models. If the true relationship between and is non-linear, the linear Granger test might fail to detect predictive power, even if a causal link exists. Non-linear extensions exist but come with their own complexities.
Non-Stationarity: The statistical tests underpinning Granger causality typically assume that the time series and are (covariance) stationary. Applying the tests to non-stationary data (e.g., series with trends or unit roots) can produce spurious results indicating Granger causality where none exists. Pre-processing steps like differencing are often required, but these can also alter the underlying relationships.
Measurement Error: Errors in measuring or can bias the coefficient estimates and affect the test outcomes, potentially masking true relationships or suggesting spurious ones.
Despite these serious limitations from a structural causal inference perspective, Granger causality tests can sometimes serve as a preliminary exploratory tool in time series analysis. They can help identify potential lagged predictive relationships that warrant further, more rigorous causal investigation using methods discussed later in this chapter, such as Structural Vector Autoregression (SVAR) or time-series causal discovery algorithms. However, interpreting a significant Granger causality result as definitive proof of a causal link is a common mistake that should be strictly avoided in sophisticated ML system development.
Understanding Granger causality provides a foundation for appreciating why more advanced techniques are needed for causal inference in temporal settings. We now turn our attention to methods like SVAR, which attempt to impose more structure to identify causal effects, albeit with their own sets of assumptions.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•