Our journey through identification strategies has equipped us with powerful tools like do-calculus to determine if a causal effect can be estimated from observational data, given a set of assumptions encoded in a causal graph. However, these identification assumptions, particularly the absence of unobserved confounding for a chosen adjustment set or the validity of an instrumental variable, are fundamentally untestable using the observed data alone. They rely on background knowledge, expert judgment, and hope. This raises a critical question: How confident can we be in our estimated causal effects if these assumptions are slightly, or even moderately, violated? Sensitivity analysis provides a framework for answering this.
Instead of providing a single point estimate of a causal effect, sensitivity analysis explores how this estimate would change under various hypothetical violations of our core identification assumptions. It allows us to quantify the robustness of our conclusions. If a small, plausible violation of an assumption drastically alters the result (e.g., changes the sign of the effect or makes it statistically insignificant), our findings are considered sensitive or fragile. Conversely, if the conclusion holds even under substantial hypothetical violations, we gain confidence in its robustness.
Consider the standard assumption needed for identifying the Average Treatment Effect (ATE) using the backdoor criterion: conditional ignorability, which states that, conditional on a set of observed covariates Z, the treatment assignment X is independent of the potential outcomes Y(x). Formally, Y(x)⊥X∣Z for all x. This assumption implies that Z blocks all backdoor paths between X and Y, meaning Z captures all common causes.
In practice, we can never be absolutely certain that our measured covariates Z truly capture all common causes. There might always be an unobserved factor U that affects both X and Y, creating residual confounding even after adjusting for Z.
A common scenario where identification assumptions might fail. Adjusting for Z does not block the backdoor path X←U→Y because U is unobserved.
Sensitivity analysis directly confronts this uncertainty. It doesn't solve the problem of unobserved confounding, but it measures its potential impact.
Several methods exist to assess sensitivity, primarily focusing on the potential influence of unobserved confounders.
Developed initially for matched observational studies, Rosenbaum's method provides a formal way to assess how strong an unobserved confounder would need to be to undermine the study's conclusions, typically regarding the statistical significance of the treatment effect.
The core idea is to introduce a sensitivity parameter, Γ≥1. Imagine two units, i and j, perfectly matched on all observed covariates Z. If there were no unobserved confounding, the odds of unit i receiving the treatment versus unit j receiving it would be equal. However, if an unobserved binary confounder U exists, these odds might differ. Γ represents the maximum odds ratio by which the treatment assignment probability could differ between two units with identical Z due to differences in U.
The analysis then calculates bounds on the p-value for the treatment effect hypothesis test for different assumed values of Γ. For example, we might find that the treatment effect remains statistically significant (e.g., p < 0.05) for Γ up to 1.8, but becomes non-significant for Γ≥1.8.
Interpretation: We would conclude the finding is robust to unobserved confounders that change the odds of treatment assignment by a factor less than 1.8. To gauge if this is truly robust, we often compare this Γ value to the effect sizes of observed covariates. If an observed covariate strongly associated with both treatment and outcome only changes the odds by, say, 1.5, then robustness up to Γ=1.8 seems reasonably strong.
Emily Oster proposed a method more directly applicable in regression settings, linking sensitivity to the stability of the treatment coefficient when observed controls are added. The intuition is that if the estimated treatment effect changes dramatically when we add observed covariates that explain little additional variance in the outcome, it suggests the estimate might be highly sensitive to unobserved covariates as well.
The method requires specifying two parameters:
Given δ and Rmax2, the method calculates the "bias-adjusted" treatment effect, β∗. We can then determine the value of δ required to make β∗ equal to zero (or cross some other threshold) for a given Rmax2.
Interpretation: If the treatment effect remains meaningfully different from zero even for plausible values of δ (like 1) and a conservatively high Rmax2, the result is considered relatively robust. Conversely, if even a small δ (e.g., 0.5) drives the effect to zero with a reasonable Rmax2, the finding is sensitive.
Graphical representations are extremely helpful. A common visualization plots the estimated treatment effect (or confidence interval bounds) against the sensitivity parameter (Γ or δ). This allows practitioners to quickly see the threshold at which their conclusions might change.
Consider a hypothetical sensitivity analysis result:
Confidence interval for the estimated treatment effect as a function of a sensitivity parameter. The interval crosses zero (red dashed line) when the parameter reaches approximately 2.7.
In this plot, the blue lines represent the 95% confidence interval for the treatment effect. As the sensitivity parameter (representing the strength of potential unobserved confounding) increases along the x-axis, the confidence interval widens and shifts. The point where the lower bound crosses the zero effect line (here, around 2.7) indicates the threshold of robustness.
While unobserved confounding is the most frequent target, sensitivity analysis can, in principle, be applied to other assumptions:
Implementing sensitivity analysis requires careful thought and justification for the chosen parameters (Γ, δ, Rmax2).
sensemakr
(R and Python), causalToolbox
(R), and functionalities within DoWhy
(Python) can help implement these analyses.Sensitivity analysis is not a magic bullet; it relies on hypothetical scenarios. However, it transforms the discussion from an untestable binary assumption ("there is no unobserved confounding") to a quantitative assessment ("how much unobserved confounding would be needed to change our conclusion?"). This shift is indispensable for responsible causal inference in complex systems, providing a necessary layer of rigor and humility to our findings.
© 2025 ApX Machine Learning