As introduced earlier in this chapter, the high flexibility of Gradient Boosting Machines can make them prone to overfitting the training data. One powerful and widely used technique to mitigate this is introducing randomness into the tree-building process through subsampling, leading to an approach often called Stochastic Gradient Boosting (SGB). This idea draws inspiration from Bagging (Bootstrap Aggregating) and Stochastic Gradient Descent (SGD), where introducing randomness via data sampling can improve model generalization and robustness.
Instead of using the entire training dataset to compute the pseudo-residuals and fit each new base learner (tree), Stochastic Gradient Boosting uses only a fraction of the training samples, drawn randomly without replacement, at each boosting iteration m. Similarly, it can also sample a fraction of the features.
The core idea behind subsampling is variance reduction. By training each tree on a slightly different subset of the data and/or features, we decorrelate the trees in the ensemble. Each tree gets a slightly different "view" of the data distribution and the error patterns (pseudo-residuals) left by the preceding trees. This prevents the ensemble from becoming overly specialized to noise or specific patterns present only in the full training set. While each individual tree might be slightly weaker (higher bias) because it's trained on less data, the ensemble as a whole becomes more robust and generalizes better to unseen data.
There are two primary ways subsampling is implemented in gradient boosting:
Row Subsampling (Subsampling Fraction):
subsample
or bagging_fraction
in libraries like Scikit-learn, XGBoost, and LightGBM.subsample=1.0
reverts to standard Gradient Boosting using all data points for every tree.Column Subsampling (Feature Fraction):
colsample_bytree
: The fraction of features sampled once for each tree construction.colsample_bylevel
: The fraction of features sampled once for each new level in a tree.colsample_bynode
(XGBoost) or feature_fraction_bynode
(LightGBM): The fraction of features sampled for each node split.The diagram below illustrates the integration of row and column subsampling into the boosting process.
A flowchart of the Stochastic Gradient Boosting algorithm incorporating both row and column subsampling within each boosting iteration.
The subsampling techniques used in SGB bear resemblance to those in Random Forests. Random Forests use bootstrap sampling (sampling with replacement) for rows and random feature selection at each split. SGB typically uses sampling without replacement for rows and offers more flexible options for feature sampling (per tree, per level, per node). The fundamental difference remains that boosting builds trees sequentially to correct prior errors, while Random Forests build trees independently in parallel.
Row and column subsampling are effective regularization tools, often used in conjunction with shrinkage (learning rate) and tree complexity constraints (e.g., max_depth
, min_child_weight
).
n_estimators
) and might work well with lower subsampling rates. Conversely, higher subsampling rates might allow for slightly higher learning rates or fewer trees.subsample
, colsample_bytree
, etc., are highly data-dependent. They are typically tuned using cross-validation alongside other important hyperparameters like the learning rate (eta
or learning_rate
), tree depth (max_depth
), and the number of boosting rounds (n_estimators
), often determined via early stopping. Techniques like Grid Search, Randomized Search, or Bayesian Optimization (covered in Chapter 8) are essential for finding good combinations.In summary, Stochastic Gradient Boosting leverages row and column subsampling to introduce randomness, which effectively reduces the variance of the final ensemble model. This makes the model less sensitive to the specific training data, improving its ability to generalize to new, unseen examples. Furthermore, it often provides the added benefit of speeding up the training process. Mastering the use and tuning of these subsampling parameters is a significant step towards building highly accurate and robust gradient boosting models.
© 2025 ApX Machine Learning