Bagging vs. Boosting

Ensemble methods combine multiple machine learning models to produce a single, superior predictive model. While there are many variations, most techniques fall into one of two fundamental strategies: building models independently in parallel or building them sequentially where each model learns from the last. Bagging and boosting are the classic illustrations of these two distinct approaches.

Bagging: Averaging Independent Models

Bagging, which stands for Bootstrap Aggregating, is an ensemble technique designed to reduce the variance of a predictive model. It is particularly effective when used with base models that are prone to overfitting, such as fully grown decision trees. The process works by introducing randomness through data sampling, training multiple models independently, and then combining their outputs.

The procedure can be broken down into two main steps:

Bootstrap Sampling: From the original training dataset of size $N$ , multiple new datasets are created, also of size $N$ . These new datasets, called bootstrap samples, are generated by sampling with replacement. This means that any given data point from the original set might appear multiple times in a sample, while others might not appear at all. On average, a bootstrap sample contains about 63% of the original data points.
Aggregation: A base model is trained independently on each bootstrap sample. Because each model sees a slightly different subset of the data, each one learns a slightly different decision boundary. For a regression task, the predictions from all models are averaged to produce the final result. For a classification task, the final prediction is determined by a majority vote among the models.

By averaging the outputs of these decorrelated models, the errors and instabilities of individual models tend to cancel each other out. This leads to a final model with lower variance and better generalization performance than any single model trained on the original data. A well-known algorithm that builds upon this idea is the Random Forest, which adds another layer of randomness by also sampling features at each split in the tree.

The bagging process trains multiple models in parallel, each on a different random sample of the data. The final prediction is an aggregation of their individual outputs.

Boosting: Learning from Mistakes Sequentially

In contrast to bagging's parallel approach, boosting builds an ensemble of models sequentially. The core idea is to construct a "strong" learner from a series of "weak" learners, where each new learner is trained to correct the errors made by its predecessors. A weak learner is a model that performs only slightly better than random chance, with a simple decision stump (a one-level decision tree) being a common choice.

The boosting process is iterative:

Initial Model: A simple base model (weak learner) is trained on the entire dataset.
Error Analysis: The model's predictions are compared to the actual outcomes. The instances that were incorrectly predicted are identified.
Sequential Training: The next weak learner is trained, but with an increased focus on the instances that the previous model misclassified. Different boosting algorithms accomplish this in different ways. For example, AdaBoost explicitly increases the weights of misclassified data points, forcing the next model to pay more attention to them. Gradient boosting, which we will study in detail, trains the next model on the residual errors of the previous one.
Weighted Combination: This process is repeated for a specified number of iterations. The final prediction is a weighted sum of the predictions from all the weak learners. Models that perform better are typically given a higher weight in the final combination.

Because boosting focuses on difficult-to-predict examples, it is very effective at reducing the overall bias of the model. It incrementally builds a complex decision boundary by combining many simple ones, turning a collection of high-bias weak learners into a single low-bias strong learner.

The boosting process trains models sequentially. Each model is trained to correct the errors of the preceding one, and their predictions are combined in a weighted sum.

A Summary of Differences

While both bagging and boosting are powerful ensemble techniques, their underlying philosophies and primary objectives are quite different.

Feature	Bagging	Boosting
Model Building	Parallel and independent.	Sequential and dependent.
Primary Goal	Reduce model variance.	Reduce model bias.
Base Learners	Complex models (low bias, high variance).	Simple models (high bias, low variance).
Data Focus	Each model trained on a random data sample.	Each model focuses on errors of the previous one.
Final Combination	Simple average or majority vote.	Weighted sum or vote.
Overfitting	Generally against overfitting.	Can overfit without careful tuning of parameters.

Understanding the distinction between these two strategies provides the necessary foundation for the rest of this course. Bagging builds a diverse group of independent decision-makers, while boosting builds a highly specialized team where each member improves upon the work of the last. Our focus will now shift entirely to boosting, starting with its earliest formalization in the AdaBoost algorithm.

Was this section helpful?

References

Bagging Predictors, Leo Breiman, 1996 Machine Learning, Vol. 24 DOI: 10.1007/BF00058655 - Original paper introducing the bagging technique to reduce variance in predictive models.
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Yoav Freund and Robert E. Schapire, 1997 Journal of Computer and System Sciences, Vol. 55 (Elsevier) DOI: 10.1006/jcss.1997.1504 - Foundational paper on the AdaBoost algorithm, a significant early boosting method.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, 2009 (Springer) - Comprehensive textbook covering theoretical foundations of statistical learning, including detailed sections on ensemble methods like bagging, boosting, and random forests. 2nd edition.
Random Forests, Leo Breiman, 2001 Machine Learning, Vol. 45 DOI: 10.1023/A:1010933404324 - Original paper introducing the Random Forest algorithm, which extends bagging with feature randomization.