While approximate methods like MCMC and Variational Inference are essential tools for handling large or complex Probabilistic Graphical Models (PGMs), situations arise where exact inference is both desirable and computationally feasible. The Junction Tree algorithm (also known as the Clique Tree or Join Tree algorithm) stands as a primary method for performing exact inference, particularly for computing marginal probabilities of variables in singly connected or moderately complex multiply connected graphs. It cleverly transforms the original graph into a tree structure composed of clusters of variables (cliques), enabling efficient message passing for belief propagation.
This algorithm builds upon the concepts of variable elimination but organizes the computations in a way that avoids redundant calculations, especially when multiple marginal queries are needed. The core idea is to leverage the structure of the graph to perform calculations efficiently.
The journey to a junction tree begins with preparing the graph. If starting with a directed graph like a Bayesian Network, the first step is moralization. This involves "marrying" the parents of each node by adding undirected edges between them and then making all original directed edges undirected. This process creates the moral graph, an undirected representation capturing the conditional dependencies relevant for inference.
The next critical step is triangulation (or making the graph chordal). An undirected graph is triangulated if every cycle of length four or more has a chord (an edge connecting two non-adjacent nodes in the cycle). Triangulation involves adding edges to the moral graph to eliminate chordless cycles. While finding an optimal triangulation (one that minimizes the size of the resulting cliques) is NP-hard, heuristic methods are often used in practice. Why triangulate? Because triangulated graphs have a special property: their maximal cliques (subsets of nodes where every pair is connected, and which are not subsets of larger cliques) can be arranged into a tree structure satisfying the running intersection property.
Once the graph is triangulated, we identify all its maximal cliques. These cliques will become the nodes of our junction tree.
A junction tree is constructed from the maximal cliques of the triangulated graph. The nodes of the junction tree are the maximal cliques C1,C2,…,Ck. Edges are added between cliques Ci and Cj if they share variables. The set of shared variables Sij=Ci∩Cj is called a separator.
The key requirement for a valid junction tree is the running intersection property (or Junction Tree Property): For any two cliques Ci and Cj in the tree, all cliques on the unique path between them must contain the intersection Ci∩Cj. This property ensures that information about shared variables flows consistently through the tree. Constructing a tree that satisfies this property can often be done using a maximum spanning tree algorithm on a graph where nodes are cliques and edge weights are the sizes of the separators ∣Ci∩Cj∣.
A simple example of a Junction Tree. Nodes represent maximal cliques from a triangulated graph. Edges connect cliques that share variables, with the shared variables (separators) indicated. This tree structure satisfies the running intersection property.
With the junction tree constructed, inference proceeds via message passing. Each clique Ci is initially associated with a potential function ψi. This potential is typically formed by multiplying together the conditional probability tables (CPTs) or factors from the original PGM whose variables are contained entirely within Ci. Careful assignment ensures each original factor is assigned to exactly one clique.
The algorithm involves two phases:
What is a message? A message mji sent from clique Cj to clique Ci is essentially a summary of the information (beliefs or potentials) gathered in the subtree rooted at Cj (when viewed from Ci's perspective), marginalized down to the variables in the separator Sij=Ci∩Cj.
Calculating Messages: Let βj be the current potential associated with clique Cj (initially ψj, then updated by incoming messages). The message mji sent from Cj to Ci is computed as:
mji(Sij)=Cj∖Sij∑βj∗(Cj)where βj∗ represents the potential βj potentially incorporating messages received by Cj from neighbors other than Ci during the current pass. Specifically, in the Collect phase, when Cj sends to Ci, βj∗ incorporates ψj and messages from all neighbors k∈N(j)∖{i}. The summation marginalizes out variables present only in Cj but not in the separator Sij.
When a clique Ci receives a message mji(Sij), it updates its own potential:
βi(Ci)←βi(Ci)⋅mji(Sij)Note that the message mji is a function only of the separator variables Sij, but it gets multiplied into the potential βi which is a function of all variables in Ci. This operation is sometimes referred to as "potential multiplication".
Final Beliefs: After both the Collect and Distribute phases complete, the potential βi(Ci) associated with each clique Ci is proportional to the true joint marginal probability distribution over the variables in that clique, given any evidence incorporated into the initial potentials:
βi(Ci)∝P(Ci∣evidence)From these final clique beliefs, we can easily compute the marginal probability for any single variable X∈Ci or subset of variables within Ci by summing out the remaining variables in βi(Ci). The running intersection property guarantees that the marginals computed for the same variable from different cliques containing it will be consistent.
The efficiency of the Junction Tree algorithm hinges critically on the size of the largest clique in the junction tree. The size of the largest clique is known as the treewidth of the graph (more accurately, the treewidth of the original graph is the minimum size of the largest clique minus one, over all possible triangulations).
The computational cost of message passing involves operations (multiplication, summation) over potentials defined on the cliques. If the maximum clique size (treewidth + 1) is wmax, and variables have at most d states, the complexity is roughly exponential in wmax (e.g., O(k⋅dwmax) where k is the number of cliques).
Therefore, the Junction Tree algorithm is efficient for graphs where a triangulation with small maximal cliques can be found (i.e., graphs with low treewidth). For graphs with high treewidth (like dense grids or highly connected networks), the cliques become too large, and the computation becomes intractable. This is where approximate inference methods become necessary.
Advantages:
Disadvantages:
In summary, the Junction Tree algorithm represents a significant theoretical and practical tool for exact inference in PGMs. It transforms the inference problem on a potentially complex graph into a more structured message-passing procedure on a tree. Its applicability is governed by the graph's structural properties, specifically its treewidth, highlighting the close relationship between graph structure and inference complexity. When the treewidth is manageable, it offers a powerful way to obtain exact probabilistic queries. When it is not, we turn to the approximate methods discussed in other sections.
© 2025 ApX Machine Learning