As we've established, the presence of unobserved confounders U poses a significant obstacle to estimating causal effects P(Y∣do(T=t)). Methods like Instrumental Variables (IV) rely on finding a variable that influences treatment T without directly affecting the outcome Y (except through T) and is independent of U. Regression Discontinuity (RDD) and Difference-in-Differences (DiD) exploit specific assignment mechanisms or data structures. Proximal Causal Inference (PCI) offers an alternative pathway to identification when these conditions are unmet but suitable "proxy" variables are available.
Introduced by Miao, Geng, and Tchetgen Tchetgen (2018), PCI provides a framework for identifying causal effects even when T and Y share an unobserved common cause U, provided we can observe two proxy variables, W and Z, that satisfy specific conditional independence properties.
The core idea is to find variables that act as imperfect representatives, or proxies, for the unobserved confounder U. Specifically, we need:
Crucially, unlike an instrument in IV, these proxies W and Z are allowed to be confounded by U. Their utility comes from how they relate U to the observed variables T and Y.
The relationships assumed in the simplest PCI setting (with observed confounders X also present) can be visualized using a Directed Acyclic Graph (DAG):
A DAG illustrating the core relationships in Proximal Causal Inference. The unobserved confounder U affects treatment T, outcome Y, and both proxies W and Z. Crucially, W only affects Y via T (once U is considered), and Z only affects T via U. Observed confounders X can also affect T and Y.
Formal identification under PCI hinges on the following conditional independence assumptions, often referred to as the "proximal conditions" or "bridge function" assumptions (assuming X represents observed confounders adjusted for):
Outcome Bridge (using Z): Y⊥W∣T,U,X. This means that given the treatment T, the unobserved confounder U, and observed confounders X, the treatment proxy W is independent of the outcome Y. It implies that W's connection to Y is fully mediated by (T,U,X).
Treatment Bridge (using W): T⊥Z∣U,X. This means that given the unobserved confounder U and observed confounders X, the outcome proxy Z is independent of the treatment T. It implies that Z's connection to T is fully mediated by (U,X).
These assumptions essentially state that W is a "sufficient proxy" for U's influence on T (conditional on X), and Z is a "sufficient proxy" for U's influence on Y (conditional on T,X).
How do these assumptions help identify P(Y∣do(T=t),X=x)? The intuition is that the observed conditional distributions involving the proxies contain enough information to reconstruct the influence of the unobserved U.
Consider the distribution of the outcome Y given the treatment T, the outcome proxy Z, and observed confounders X, denoted p(y∣t,z,x). This can be expressed by marginalizing over the unobserved U:
p(y∣t,z,x)=∫p(y∣t,u,z,x)p(u∣t,z,x)duUsing the conditional independence assumptions (specifically Y⊥W∣T,U,X implies p(y∣t,u,z,x)=p(y∣t,u,x) under certain conditions, and similarly T⊥Z∣U,X helps simplify p(u∣t,z,x)), PCI theory shows that the target causal effect p(y∣do(t),x)=∫p(y∣t,u,x)p(u∣x)du can be identified by solving a system of integral equations.
Specifically, identification often relies on solving two Fredholm integral equations of the first kind. Let q(y∣t,x)=p(y∣do(t),x) be the target quantity. The theory demonstrates relationships like:
p(y∣z,t,x)=∫K1(z,u,t,x)p(y∣t,u,x)du p(t∣w,x)=∫K2(w,u,x)p(t∣u,x)duWhere p(y∣t,u,x) and p(t∣u,x) act like unknown functions, and K1,K2 are kernels involving the distributions of U. PCI shows how to use the observed distributions p(y∣z,t,x), p(t∣w,x), and p(z∣w,x) (under certain conditions) to solve for the necessary components to ultimately reconstruct p(y∣do(t),x).
This mathematical machinery effectively uses W and Z as "bridges" to account for the confounding effect of U without observing U directly.
It's informative to contrast PCI with IV:
PCI essentially trades the IV exogeneity assumption (I⊥U) for the proximal conditional independence assumptions. This can be advantageous in scenarios where finding a truly exogenous instrument is difficult, but variables related to U that satisfy the bridge conditions might exist.
While theoretically elegant, applying PCI presents practical challenges:
CausalPy
or specific research implementations for potential tools.Proximal Causal Inference provides a valuable addition to the toolkit for causal inference in the presence of unobserved confounding. It operates under a different set of assumptions compared to IV, RDD, or DiD, relying on the existence of suitable proxy variables W and Z. While finding such proxies and performing estimation can be challenging, PCI opens up possibilities for causal effect identification in complex systems where traditional methods might not apply. Understanding its principles allows you, as an expert practitioner, to consider a wider range of strategies when confronting hidden bias in your machine learning applications.
© 2025 ApX Machine Learning