Let's put the theory into practice. The previous sections laid out the formal machinery: Structural Causal Models (SCMs), graphical representations like DAGs, the rules of do-calculus, and various identification strategies. Now, we'll work through applying this logic to determine if a desired causal effect can be estimated from observed data, even when standard adjustment criteria aren't sufficient. Remember, identification precedes estimation; it tells us what to estimate, assuming our causal model is correct.Scenario 1: Dealing with Unobserved ConfoundingConsider the causal structure represented by the following Directed Acyclic Graph (DAG). We have variables $W, X, M, Y$, where $X$ is the treatment, $Y$ is the outcome, $M$ is a mediator, and $W$ is an observed covariate. Crucially, assume there's an unobserved common cause $U$ affecting both $X$ and $Y$.digraph G { rankdir=LR; node [shape=circle, style=filled, fillcolor="#e9ecef", color="#495057"]; edge [color="#495057"]; U [style=dashed, fontcolor=gray, color=gray]; W; X; M; Y; W -> X [color="#1c7ed6"]; X -> M [color="#1c7ed6"]; M -> Y [color="#1c7ed6"]; W -> Y [color="#1c7ed6"]; U -> X [style=dashed, color=gray]; U -> Y [style=dashed, color=gray]; }Causal graph with observed variables W, X, M, Y and an unobserved confounder U.Our goal is to identify the causal effect of $X$ on $Y$, represented by the interventional distribution $P(Y | do(X=x))$.Analysis:Backdoor Criterion: Can we find a set of observed variables $Z$ that blocks all backdoor paths from $X$ to $Y$? The backdoor paths are:$X \leftarrow W \to Y$ (Blocked by conditioning on $W$)$X \leftarrow U \to Y$ (Cannot be blocked because $U$ is unobserved) Since we cannot block the path involving $U$, the standard backdoor criterion fails.Frontdoor Criterion: Can we find a set of observed variables $M$ that intercepts all directed paths from $X$ to $Y$, satisfies certain blocking conditions, and for which the effects $P(M|do(X))$ and $P(Y|do(M))$ are identifiable?$M$ intercepts the directed path $X \to M \to Y$.Is there an unblocked backdoor path from $X$ to $M$? No. So, $P(M|do(X=x)) = P(M|X=x)$ is identifiable (Rule 2 of do-calculus, or simply no confounding).Are all backdoor paths from $M$ to $Y$ blocked by $X$? The path $M \leftarrow X \leftarrow U \to Y$ is open. The path $M \leftarrow X \leftarrow W \to Y$ is also potentially open. We need to block these. Conditioning on $X$ blocks $M \leftarrow X \leftarrow U \to Y$. Does conditioning on $X$ also block $M \leftarrow X \leftarrow W \to Y$? Yes. Therefore, $P(Y|do(M=m)) = \sum_x P(Y|M=m, X=x) P(X=x|do(M=m))$. Since $X$ is not a descendant of $M$ in $G_{\bar{M}}$, we might think $P(X=x|do(M=m)) = P(X=x)$. However, we need to be careful. The front-door criterion requires no unblocked back-door path from $X$ to $M$, which holds. It also requires all back-door paths from M to Y are blocked by X. Let's check again: $M \leftarrow X \leftarrow W \to Y$. Conditioning on $X$ blocks this path. $M \leftarrow X \leftarrow U \to Y$. Conditioning on $X$ blocks this path. It seems the conditions hold.Therefore, $P(Y|do(M=m)) = \sum_x P(Y|M=m, X=x) P(X=x)$.Applying the frontdoor formula: $$ P(Y|do(X=x)) = \sum_m P(M=m|do(X=x)) P(Y|do(M=m)) $$ $$ P(Y|do(X=x)) = \sum_m P(M=m|X=x) \left[ \sum_{x'} P(Y|M=m, X=x') P(X=x') \right] $$ This expression involves only probabilities estimable from observational data. Thus, the effect $P(Y|do(X=x))$ is identifiable via the frontdoor criterion in this specific graph.Takeaway: Even with an unobserved confounder $U$, careful application of criteria like the frontdoor adjustment (or systematically applying do-calculus) can lead to identification.Scenario 2: Identification with FeedbackConsider a simplified system with potential feedback between $X$ and $Y$, along with an observed covariate $Z$ and an unobserved confounder $U$. We might represent this using a cyclic graph, although interpretation requires care (often implying an underlying temporal process or equilibrium state).digraph G { rankdir=LR; node [shape=circle, style=filled, fillcolor="#e9ecef", color="#495057"]; edge [color="#495057"]; U [style=dashed, fontcolor=gray, color=gray]; Z; X; Y; Z -> X [color="#1c7ed6"]; X -> Y [color="#f03e3e", constraint=false]; // Add constraint=false for better layout with cycles Y -> X [color="#f03e3e", constraint=false]; U -> X [style=dashed, color=gray]; U -> Y [style=dashed, color=gray]; }Causal graph with feedback between X and Y, an observed covariate Z, and unobserved confounder U.Can we identify $P(Y | do(X=x))$?Analysis:Challenges: Standard DAG-based criteria (backdoor, frontdoor) and basic do-calculus rules were primarily developed for acyclic graphs. Cycles introduce significant complications, including potential issues with defining interventions and unique solutions for structural equations.Do-calculus Application (Attempt): Let's try applying do-calculus formally. $P(Y | do(X=x))$ involves intervening on $X$. In the graph modified by $do(X=x)$, we remove all arrows pointing into $X$. This breaks the cycle. The modified graph $G_{\bar{X}}$ looks like:digraph G_bar_X { rankdir=LR; node [shape=circle, style=filled, fillcolor="#e9ecef", color="#495057"]; edge [color="#495057"]; U [style=dashed, fontcolor=gray, color=gray]; Z; X; Y; // Z -> X removed by intervention X -> Y [color="#f03e3e"]; // Y -> X removed by intervention // U -> X removed by intervention U -> Y [style=dashed, color=gray]; Z; // Z exists but has no outgoing arrows to remaining system }Graph modified by the intervention do(X=x), removing incoming edges to X. In this modified graph $G_{\bar{X}}$, the only factor influencing $Y$ (apart from the fixed $X=x$) is $U$. We need to find an expression for $P(Y | do(X=x))$ using the original observational distribution. The path $X \to Y$ remains. The path $X \leftarrow Y$ is gone. The path $X \leftarrow U \to Y$ is relevant in the original graph but the $U \to X$ link is severed by the intervention. However, the link $U \to Y$ remains. Can we condition on $Z$? In $G_{\bar{X}}$, $Z$ is disconnected from $Y$. Does $Z$ block any backdoor paths in the original graph? $X \leftarrow Z$. $X \leftarrow Y$. $X \leftarrow U \to Y$. $Z$ does not block the path through $U$.Non-Identifiability: In this setup, $P(Y | do(X=x))$ is generally not identifiable from observational data alone. The unobserved confounder $U$ affects both $X$ (in the original graph) and $Y$, and the cycle involving $Y \to X$ complicates adjustments. Severing the incoming links to $X$ still leaves the confounding path $X \to Y \leftarrow U$ active through $U$'s effect on $Y$. Without further assumptions (e.g., specific functional forms, knowledge about equilibrium, or instrumental variables), we cannot isolate the causal effect of $X$ on $Y$.Takeaway: Cycles, especially combined with unobserved confounding, often lead to non-identifiability using standard observational data. Advanced techniques or different data types (like interventional data or panel data, explored in later chapters) might be required. Sensitivity analysis becomes particularly important here to understand how assumptions about $U$ might influence conclusions.Using Identification ToolsWhile manual application of do-calculus is fundamental for understanding, software libraries can automate parts of this process for complex graphs. Tools like Python's DoWhy library allow you to define a causal graph (often using the GML or DOT format) and specify a causal query (e.g., identify $P(Y | do(X=x))$).import dowhy import dowhy.gcm as gcm # Define the graph from Scenario 1 (without U for simplicity here, or handle U) # Using graphical model syntax (example) causal_graph = """ digraph { W -> X; X -> M; M -> Y; W -> Y; # U [label="Unobserved"]; # How U is handled depends on library features # U -> X; U -> Y; } """ # Assuming data is loaded into a pandas DataFrame `df` # Initialize the CausalModel model = dowhy.CausalModel( data=df, # Your observational data treatment='X', outcome='Y', graph=causal_graph ) # Attempt identification identified_estimand = model.identify_effect(proceed_when_unidentifiable=True) # Print the result print(identified_estimand)Running such code (potentially needing adjustments for handling $U$ explicitly if the library supports it) would attempt to apply identification rules automatically. For Scenario 1, it should ideally return the frontdoor estimand we derived. For Scenario 2, it would likely report non-identifiability given the cycle and implied confounding (if $U$ were representable).Caution: Automated tools are powerful aids but not substitutes for understanding. They rely on the correctness of the input graph and assumptions. Always critically evaluate the tool's output and understand why a particular estimand was returned or why identification failed. Your grasp of do-calculus and identification logic allows you to verify these results and troubleshoot when the tool struggles with complex or non-standard cases.These exercises illustrate that identification is a critical reasoning step. Before fitting any machine learning model for causal effect estimation (as covered in Chapter 3), you must first determine if the effect is estimable from your data and assumptions, and what statistical quantity corresponds to the causal effect you seek.