One of the fundamental challenges in machine learning is managing the tradeoff between bias and variance. A model with high bias makes strong assumptions about the data and fails to capture its underlying patterns (underfitting). A model with high variance is overly sensitive to the training data and captures random noise, leading to poor performance on new data (overfitting). Ensemble methods offer powerful strategies for managing this tradeoff, but bagging and boosting do so in distinctly different ways.
Bagging, short for Bootstrap Aggregating, is primarily a variance-reduction technique. The strategy works best with models that are unstable and have high variance, such as fully grown decision trees. These models are considered "strong" but unstable learners because they tend to have low bias but overfit their training data significantly.
The process involves two main steps:
By training models on different subsets of the data, we create a diverse set of predictors. While each individual model might be overfitted and produce a noisy prediction, the errors are often uncorrelated. When we average these predictions, the noise tends to cancel out, resulting in a smoother, more stable prediction boundary. The bias of the final model remains roughly the same as the bias of the individual base models, but the variance is substantially reduced.
Individual high-variance models closely fit the specific training points they see. Averaging their predictions produces a bagged model that is much smoother and closer to the true underlying function.
Boosting operates on a completely different principle. It is primarily a bias-reduction technique. The strategy begins with simple base models that have high bias, such as shallow decision trees (often just "stumps" with a single split). These models are considered "weak learners" because, on their own, they perform only slightly better than random guessing.
Boosting builds an ensemble sequentially. Each new model is trained to correct the errors made by the combination of the previous models. For example, in AdaBoost, data points that were misclassified by earlier models are given more weight in the training of subsequent models. This forces the algorithm to focus on the "hard" examples that it is struggling with.
By adding model after model, each one chipping away at the remaining error, the ensemble gradually becomes a strong learner. The final model has significantly lower bias than any of its weak components. However, this aggressive focus on minimizing training error comes with a risk. If you add too many models in the sequence, the ensemble can begin to overfit the training data, which in turn increases its variance. This is why parameters that control the number of models and the learning rate are so important for regularization in boosting algorithms.
The choice between bagging and boosting often depends on the type of error you need to address. If your base model is too complex and overfits, bagging is a good choice. If your base model is too simple and underfits, boosting is the better approach.
A comparison of how bagging and boosting manage the bias-variance tradeoff.
With this foundation, you are now prepared to see how the Gradient Boosting Machine extends the core idea of boosting. Instead of using a simple weighting scheme like AdaBoost, it uses a more generalized and powerful technique based on gradients to correct errors, giving us fine-grained control over model performance.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with