While bagging builds an ensemble by averaging independent models, boosting takes a more collaborative, sequential approach. Instead of a single round of voting among equally skilled models, boosting builds a team of specialists, where each new member is trained to fix the mistakes made by the team so far. This iterative process is the foundation of boosting's ability to produce highly accurate models.
Imagine a group of students trying to learn a difficult subject. The first student studies the material and takes a practice test. They get some answers right and some wrong. The teacher then gives the second student the same material but emphasizes the questions the first student answered incorrectly. The second student focuses their effort on these difficult questions. This process continues, with each subsequent student concentrating on the remaining areas of weakness. The final "model" is the combined knowledge of all students, with more credit given to those who mastered the tougher concepts.
This is precisely the principle behind boosting. The algorithm builds a chain of models, typically simple ones called weak learners, where each model in the chain is trained to correct the errors of its predecessor.
A weak learner is a model that performs only slightly better than random chance. In the context of boosting, the most common weak learner is a decision stump, which is a decision tree with only a single split. On their own, decision stumps are not very powerful. However, by combining hundreds or thousands of them in a structured, sequential manner, boosting algorithms can build a highly accurate and powerful final model.
The process generally follows these steps:
The following diagram illustrates this iterative flow.
Each weak learner is trained on a version of the dataset where points misclassified by previous learners are given more importance. The final model is a weighted sum of all learners.
The method for "focusing on errors" is what distinguishes different boosting algorithms. At a high level, there are two primary approaches:
Regardless of the specific technique, the final prediction is not made by the last model alone. Instead, it is a weighted combination of all the weak learners trained during the process. A strong learner, , is formed by summing the outputs of all weak learners, , each scaled by a weight, .
The weight typically reflects the performance of the weak learner ; better-performing learners get a larger say in the final outcome. This weighted aggregation turns a sequence of simple, weak models into a single, powerful predictor.
This sequential, error-correcting process is the defining characteristic of boosting. In the next section, we will examine AdaBoost, the first practical and highly successful boosting algorithm, to see a concrete implementation of these ideas.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with