XGBoost utilizes a regularized learning objective to build the decision trees that constitute its ensemble. A primary component of this tree-building process is the split finding algorithm. For each potential node in a tree being constructed, the algorithm must identify the best feature and the best split point for that feature to partition the data. The "best" split maximizes the reduction in the objective function value, which is commonly referred to as the gain.
XGBoost employs several strategies for finding splits, and the most fundamental one is the Exact Greedy Algorithm. It's termed "exact" because it meticulously evaluates every possible split point for every feature among the instances currently residing in that node. It's "greedy" because it makes the locally optimal choice at each node, selecting the split with the highest gain without considering the impact on subsequent splits further down the tree.
Recall the XGBoost objective function at step , incorporating the complexity penalty:
Where and are the first and second-order gradients of the loss function with respect to the prediction for instance at step . is the prediction of the new tree for instance , and is the regularization term, with being the number of leaves and the weight of leaf .
For a fixed tree structure, the optimal weight for a leaf containing the set of instances is:
And the corresponding optimal objective value contributed by this tree structure is:
Let's define and . The objective becomes:
Now, consider splitting a leaf (node) containing instances into two children: a left child with instances and a right child with instances , such that . The gain from this split is the objective value for the parent node (if it remained a leaf) minus the sum of the objective values for the two new child leaves.
The objective value if the node is not split (remains a single leaf) is:
The objective value if the node is split into and is:
The gain is defined as . However, XGBoost literature often defines gain slightly differently, focusing purely on the improvement in the first part of the objective function and subtracting the penalty for adding a leaf (since splitting replaces one leaf with two, adding complexity):
This formula is the core calculation used by the exact greedy algorithm. A split is considered beneficial only if the resulting gain is positive (or greater than a minimum threshold parameter, min_split_loss, also known as gamma in the API). The term acts as a regularization parameter, controlling the complexity by setting a minimum gain required to make a split.
The algorithm proceeds as follows for each node:
max_gain = 0.The following diagram illustrates the process for finding the best split for a single feature:
Iterative process within the exact greedy algorithm for a single feature. Instances are sorted, and each potential split point is evaluated based on the gain calculation. The best split across all features determines the node's fate.
The main computational bottleneck in the exact greedy algorithm is sorting the feature values at each node. If there are instances reaching a node and features:
While thorough, this enumeration can become computationally demanding, especially when dealing with:
This computational cost motivates the development of approximate split finding algorithms, which we will discuss in the next section. They trade off some accuracy in finding the absolute best split for significant gains in computational efficiency, making XGBoost practical for very large datasets.
The exact greedy algorithm forms the foundation for tree construction in XGBoost. Its strength lies in its exhaustive search for the locally optimal split based on the regularized objective function. Understanding this process is important for appreciating how XGBoost builds highly effective trees and why alternative methods like approximate algorithms are sometimes necessary.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with