Constraint-based algorithms represent a foundational approach to causal discovery. They operate on the principle that the observed statistical dependencies and independencies in data reflect the underlying causal structure, assuming certain conditions hold. Specifically, these methods leverage the connection between conditional independence relationships (X⊥Y∣Z) and the graphical concept of d-separation in the true causal Directed Acyclic Graph (DAG).
The core assumption underpinning most constraint-based methods is faithfulness: the idea that all conditional independence relations present in the data distribution are consequences of d-separation in the causal DAG, and conversely, all d-separations in the DAG correspond to conditional independencies in the distribution. Violations of faithfulness (e.g., exact cancellation of effects along different paths) can mislead these algorithms.
The PC Algorithm: Discovering Structure under Causal Sufficiency
The PC algorithm, named after its developers Peter Spirtes and Clark Glymour, is perhaps the most well-known constraint-based method. It operates under the significant assumption of Causal Sufficiency, meaning there are no unobserved common causes (latent confounders) influencing multiple measured variables.
The PC algorithm proceeds in two main phases:
-
Adjacency Search (Skeleton Identification):
- Start with a complete undirected graph where every variable is connected to every other variable.
- Iteratively remove edges between variable pairs X and Y. An edge X−Y is removed if a conditional independence X⊥Y∣S is found for some subset S of the neighbors of X (excluding Y) or the neighbors of Y (excluding X).
- The algorithm starts testing with conditioning sets S of size 0 (∣S∣=0). If X⊥Y, the edge X−Y is removed.
- It then proceeds to test for conditional independence given conditioning sets of size 1 (∣S∣=1), then size 2 (∣S∣=2), and so on, up to a predefined maximum size or until no more edges can be removed. This iterative process efficiently prunes the graph.
- The result of this phase is an undirected graph representing the causal skeleton, where remaining edges indicate direct causal connections or confounding (which is ruled out by the causal sufficiency assumption).
-
Edge Orientation:
- The algorithm orients edges based on the conditional independence information gathered during the adjacency search.
- Identify V-Structures: Search for triples X−Z−Y such that X and Y are not adjacent, but the edge X−Z and Z−Y were not removed. If the conditioning set SXY that separated X and Y (i.e., led to finding X⊥Y∣SXY) does not contain Z, orient the edges as X→Z←Y. This forms a v-structure (or collider) at Z.
- Apply Orientation Rules: Repeatedly apply logical rules to orient remaining edges, ensuring no new v-structures are created and no directed cycles are formed. For example:
- If X→Y−Z exists and X and Z are not adjacent, orient Y→Z (to avoid the v-structure X→Y←Z).
- If X→Y→Z exists and there is an edge X−Z, orient it as X→Z (to avoid a cycle X→Y→Z→X).
The output of the PC algorithm is typically a Completed Partially Directed Acyclic Graph (CPDAG), which represents a Markov equivalence class of DAGs. All DAGs in this class share the same skeleton and v-structures and imply the same set of conditional independencies.
Simplified illustration of PC algorithm phases on three variables A, B, C, assuming the true structure is A -> B <- C and faithfulness holds. Step 2 shows the skeleton after testing reveals A and C are independent given B. Step 3 shows the resulting v-structure orientation.
Strengths and Limitations:
- Strengths: Computationally efficient for sparse graphs compared to score-based methods; provides insights into conditional independence structure.
- Limitations: Highly sensitive to errors in conditional independence tests (statistical errors with finite data, non-linear relationships); requires the strong assumption of Causal Sufficiency; output is an equivalence class, not necessarily the single true DAG. Order-dependency in standard implementations can affect results, although stable versions exist (e.g., PC-Stable).
The FCI Algorithm: Handling Latent Confounders
The assumption of Causal Sufficiency is often unrealistic in practice. Unobserved common causes (latent confounders) can induce spurious associations, leading PC to incorrect conclusions (e.g., adding spurious edges or misorienting existing ones).
The Fast Causal Inference (FCI) algorithm is a constraint-based method explicitly designed to handle the potential presence of latent confounders. It relaxes the Causal Sufficiency assumption.
Key differences from PC:
-
Output Representation: FCI outputs a Partial Ancestral Graph (PAG). PAGs are more complex than DAGs or CPDAGs, using edge marks to represent uncertainty introduced by potential latent variables. Common edge marks include:
- X→Y: Definitely X causes Y.
- X↔Y: Definitely a latent confounder L exists such that X←L→Y.
- X∘→Y: Either X→Y or X↔Y.
- X∘−∘Y: Adjacency is certain, but orientation is completely uncertain (X→Y, X←Y, or X↔Y).
- X−Y: Represents adjacency established in the skeleton phase, before orientation rules resolve it further.
-
Independence Tests: FCI requires tests for conditional independence that are robust to selection bias potentially induced by conditioning on colliders involving latent variables. This involves more complex conditioning set selection (using "possible-D-SEP" sets).
-
Orientation Rules: The orientation rules are significantly more involved than PC's, designed to deduce ancestral relationships (e.g., X is an ancestor of Y) and the presence of confounders, even without observing them directly.
Comparison of PC and FCI outputs given a true structure with a latent confounder L between X and Y, where both X and Y cause Z. PC might infer a spurious edge or incorrect orientations. FCI correctly represents the uncertainty using PAG edge marks, indicating potential confounding (X∘−∘Y) and uncertain direct causation (X∘→Z, Y∘→Z).
Strengths and Limitations:
- Strengths: Provides asymptotically correct results even with latent confounders (under faithfulness and test assumptions). Offers a principled way to handle missing variables.
- Limitations: Computationally more demanding than PC. The output PAG can be complex and less interpretable than a DAG, potentially leaving many relationships unresolved. Still sensitive to statistical errors in independence tests.
Extensions and Variants
Both PC and FCI have inspired numerous extensions and variations designed to address specific limitations or improve performance:
- PC-Stable, Conservative PC: Modifications to PC that make the skeleton identification phase independent of the order in which variables are processed, enhancing stability. Conservative PC is more cautious about removing edges when test results conflict.
- RFCI (Really Fast Causal Inference): A variant of FCI that aims for similar robustness to latent variables but with significantly improved computational speed, particularly in the adjacency search phase. It might be slightly less informative in its orientation phase compared to standard FCI.
- COmbINE: Integrates background knowledge (e.g., known temporal order, forbidden edges) directly into the constraint-based search process.
- Kernel-based Tests (KCI): Replace standard conditional independence tests (like Fisher's Z or G-squared) with kernel-based measures (e.g., using Hilbert-Schmidt Independence Criterion, HSIC). These can detect non-linear relationships and handle non-Gaussian data more effectively but often come with higher computational costs.
- Constraint-based Algorithms for Mixed Data: Extensions designed to handle datasets containing both continuous and discrete variables simultaneously.
Practical Considerations for Implementation
Applying constraint-based algorithms effectively requires careful attention to several factors:
- Choice of Conditional Independence Test: Select a test appropriate for your data type (e.g.,
Fisher-Z
test for multivariate Gaussian data, Chi-squared
or G-squared
test for discrete data, Kernel Conditional Independence (KCI)
test for non-linear continuous data). The assumptions of the chosen test must be reasonably met.
- Significance Level (α): The threshold used for independence tests significantly impacts the resulting graph. A smaller α leads to fewer edges being removed (denser graph, fewer false negatives, more false positives for independence), while a larger α results in more edges being removed (sparser graph, more false negatives, fewer false positives for independence). Choosing α often involves domain knowledge or model selection techniques.
- Computational Cost: The complexity grows rapidly with the number of variables and the density of the true graph. The maximum size of the conditioning set considered is a critical parameter affecting runtime. For high-dimensional data, these methods can become computationally prohibitive without optimizations or sparsity assumptions.
- Data Requirements: Constraint-based methods generally require substantial amounts of data to reliably estimate conditional independencies, especially when conditioning on larger sets of variables. Performance degrades significantly with insufficient data.
- Interpretation: Remember that PC outputs a Markov equivalence class (CPDAG), and FCI outputs a PAG. These represent sets of possible causal structures consistent with the data and assumptions, not necessarily the single ground truth DAG. Interpreting the specific edge types in a PAG is essential for understanding the uncertainty.
Constraint-based methods provide a powerful framework for reasoning about causal structure from observational data. While PC offers efficiency under the strong assumption of causal sufficiency, FCI and its variants provide robustness in the more realistic setting where latent confounders may be present. Understanding their assumptions, limitations, and the nature of their output is fundamental for applying them correctly in advanced causal inference workflows.