Ensemble methods combine multiple machine learning models to produce a single, superior predictive model. While there are many variations, most techniques fall into one of two fundamental strategies: building models independently in parallel or building them sequentially where each model learns from the last. Bagging and boosting are the classic illustrations of these two distinct approaches.
Bagging, which stands for Bootstrap Aggregating, is an ensemble technique designed to reduce the variance of a predictive model. It is particularly effective when used with base models that are prone to overfitting, such as fully grown decision trees. The process works by introducing randomness through data sampling, training multiple models independently, and then combining their outputs.
The procedure can be broken down into two main steps:
Bootstrap Sampling: From the original training dataset of size , multiple new datasets are created, also of size . These new datasets, called bootstrap samples, are generated by sampling with replacement. This means that any given data point from the original set might appear multiple times in a sample, while others might not appear at all. On average, a bootstrap sample contains about 63% of the original data points.
Aggregation: A base model is trained independently on each bootstrap sample. Because each model sees a slightly different subset of the data, each one learns a slightly different decision boundary. For a regression task, the predictions from all models are averaged to produce the final result. For a classification task, the final prediction is determined by a majority vote among the models.
By averaging the outputs of these decorrelated models, the errors and instabilities of individual models tend to cancel each other out. This leads to a final model with lower variance and better generalization performance than any single model trained on the original data. A well-known algorithm that builds upon this idea is the Random Forest, which adds another layer of randomness by also sampling features at each split in the tree.
The bagging process trains multiple models in parallel, each on a different random sample of the data. The final prediction is an aggregation of their individual outputs.
In contrast to bagging's parallel approach, boosting builds an ensemble of models sequentially. The core idea is to construct a "strong" learner from a series of "weak" learners, where each new learner is trained to correct the errors made by its predecessors. A weak learner is a model that performs only slightly better than random chance, with a simple decision stump (a one-level decision tree) being a common choice.
The boosting process is iterative:
Because boosting focuses on difficult-to-predict examples, it is very effective at reducing the overall bias of the model. It incrementally builds a complex decision boundary by combining many simple ones, turning a collection of high-bias weak learners into a single low-bias strong learner.
The boosting process trains models sequentially. Each model is trained to correct the errors of the preceding one, and their predictions are combined in a weighted sum.
While both bagging and boosting are powerful ensemble techniques, their underlying philosophies and primary objectives are quite different.
| Feature | Bagging | Boosting |
|---|---|---|
| Model Building | Parallel and independent. | Sequential and dependent. |
| Primary Goal | Reduce model variance. | Reduce model bias. |
| Base Learners | Complex models (low bias, high variance). | Simple models (high bias, low variance). |
| Data Focus | Each model trained on a random data sample. | Each model focuses on errors of the previous one. |
| Final Combination | Simple average or majority vote. | Weighted sum or vote. |
| Overfitting | Generally against overfitting. | Can overfit without careful tuning of parameters. |
Understanding the distinction between these two strategies provides the necessary foundation for the rest of this course. Bagging builds a diverse group of independent decision-makers, while boosting builds a highly specialized team where each member improves upon the work of the last. Our focus will now shift entirely to boosting, starting with its earliest formalization in the AdaBoost algorithm.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with