Having reviewed ensemble methods and the role of decision trees, let's formalize the sequential learning process that underpins boosting algorithms. This process is known as additive modeling. Unlike methods like Bagging, where base learners are often trained independently and in parallel, additive models are built iteratively. Each new component added to the model focuses on the shortcomings of the existing ensemble.
Imagine building a predictive model step-by-step. You start with an initial, often very simple, prediction. Then, you assess where this initial prediction falls short. Based on this assessment, you add a new component, a simple model, specifically designed to compensate for the errors made so far. You repeat this process, incrementally refining the overall model by adding components that address the remaining errors.
This iterative refinement is the essence of additive modeling. The final prediction is the sum (or addition) of the predictions from all the components built sequentially.
Mathematically, an additive model FM(x) making predictions for an input x after M stages (or iterations) can be represented as:
FM(x)=F0(x)+m=1∑Mβmhm(x)Let's break down this equation:
The model construction proceeds as follows:
This iterative process is visualized below:
The additive modeling process: Start with an initial model (F0), calculate errors, fit a new base learner (hm) to those errors, and add it to the ensemble (Fm=Fm−1+βmhm). Repeat for M steps.
The power of this framework lies in its flexibility and focus. By concentrating on the errors of the preceding model, each new learner tackles the aspects of the problem that the current ensemble finds most difficult. This allows the model to gradually improve performance, potentially capturing complex patterns that a single model might miss.
Gradient Boosting is a highly successful algorithm family that operates within this additive modeling framework. It provides a specific, mathematically grounded way to determine how each new base learner hm(x) should be trained to best correct the errors of Fm−1(x), using the concept of gradient descent in function space. We will explore this connection in detail as we move forward. Understanding this additive structure is foundational to grasping the mechanics of GBM, XGBoost, LightGBM, and CatBoost.
© 2025 ApX Machine Learning