XGBoost, a widely used gradient boosting algorithm, delivers impressive performance. However, its tree-building process can become a computational bottleneck on datasets with a very large number of instances. The core issue is that for each split, the algorithm must scan through every single data point to evaluate potential gains. Microsoft's LightGBM (Light Gradient Boosting Machine) was engineered specifically to address this challenge by introducing more efficient training methods.
One of its primary optimizations is a novel sampling technique called Gradient-based One-Side Sampling, or GOSS. This approach is built on a simple yet effective observation: not all data instances contribute equally to the training process.
In gradient boosting, the gradient of the loss function for each instance represents how "wrong" the current model's prediction is for that instance. A large gradient means the instance is poorly predicted and is, therefore, an "informative" example from which the model can learn a great deal. Conversely, an instance with a small gradient is already well-predicted by the ensemble; the model has less to learn from it.
Traditional stochastic gradient boosting methods sample data uniformly. GOSS proposes a more intelligent alternative. Instead of treating all instances equally, it focuses the learning process on the instances that are harder to fit. The main idea is to keep all of the instances with large gradients and perform random sampling only on the instances with small gradients.
The "One-Side" in the name refers to the fact that we are down-sampling from only one side of the data, the side with small, less informative gradients. The procedure can be broken down into a few steps:
a * 100% of instances with the largest gradients. These are the most informative samples.(1 - a) * 100% of instances, randomly sample b * 100% of them. These are the less informative samples.(1 - a) / b. This re-weighting ensures that the sampled data contributes to the gradient statistics in a way that is proportional to its original size, preventing the model from becoming biased towards the large-gradient data.This process allows LightGBM to use a much smaller, more focused dataset to find the best splits for each new tree, dramatically reducing computation time without a substantial sacrifice in accuracy.
Diagram of the Gradient-based One-Side Sampling (GOSS) process. The algorithm retains all data points with large gradients and samples a fraction of those with small gradients, re-weighting them to maintain the overall data distribution.
In the LightGBM library, you can enable GOSS by setting the boosting_type parameter to 'goss'. The proportions a and b are controlled by the top_rate and other_rate hyperparameters, respectively. This informed sampling strategy is a significant reason why LightGBM often trains much faster than other gradient boosting implementations, making it a powerful choice for working with large-scale datasets.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with