While Directed Acyclic Graphs (DAGs) are the workhorse for representing causal assumptions derived from Structural Causal Models (SCMs), real-world machine learning systems often present complexities that stretch the limits of basic DAGs. Understanding more sophisticated graphical representations is essential for tackling issues like hidden confounding, selection bias, and feedback mechanisms rigorously.
Standard DAGs encode conditional independence relationships through d-separation, underpinning identification criteria like the backdoor and frontdoor adjustments. However, they implicitly assume all relevant variables are observed and that the system is acyclic. When these assumptions are violated, or when we need to represent more complex mechanisms like selection processes, we turn to extended graphical frameworks.
A common challenge is the presence of unobserved (latent) confounders. While a DAG can include latent variables (often denoted U), analyzing the graph after marginalizing out these U variables leads to structures that are no longer simple DAGs. Marginalizing over a latent common cause introduces dependencies that cannot be represented solely with directed edges among the observed variables.
Maximal Ancestral Graphs (MAGs): These graphs generalize DAGs to include bidirected edges (X↔Y). A bidirected edge indicates the presence of an unobserved common cause between X and Y. MAGs represent the conditional independence structure among observed variables that holds after marginalizing out latent variables from an underlying DAG. Separation in MAGs is defined by m-separation, a more complex criterion than d-separation.
Partial Ancestral Graphs (PAGs): PAGs are even more general. They arise from constraint-based discovery algorithms (like FCI, covered in Chapter 2) when latent variables might be present. PAGs can contain directed edges (X→Y), bidirected edges (X↔Y), and partially oriented edges (e.g., X∘−∘Y, X∘→Y). These partially oriented edges represent uncertainty about the exact causal structure due to potential hidden variables or statistically equivalent models.
Marginalizing the latent confounder U in the DAG results in a bidirected edge between X and Y in the corresponding MAG, indicating shared hidden confounding.
Selection bias occurs when the inclusion of data points into the analysis depends on variables within the system, often conditioning on a collider or a descendant of a collider, which can induce spurious associations. Standard DAGs don't explicitly represent the selection mechanism itself.
Selection Diagrams (often based on SWIGs - Single World Intervention Graphs): These augment the original DAG by adding a node, typically S, representing the selection event (e.g., S=1 if selected, S=0 otherwise). Arrows point into S from the variables that influence selection. Analyzing the graph conditional on S=1 (which is what happens with biased data) makes the effects of selection explicit.
A selection diagram where selection S depends on a collider Z. Conditioning on S=1 (analyzing only selected data) opens the path T→Z→Y through the collider Z, potentially creating a spurious association between T and Y.
These advanced graphical representations are not just notational conveniences. They come with their own sets of graphical criteria for determining identifiability of causal effects.
Understanding these representations is the first step towards applying the advanced identification strategies discussed later in this chapter and leveraging the causal discovery and estimation techniques covered in subsequent chapters, especially when dealing with the messy, high-dimensional, and potentially biased data typical of real-world ML systems. They provide the formal language needed to articulate assumptions about complex data-generating processes.
© 2025 ApX Machine Learning