For regression tasks, where the goal is to predict a continuous value, Scikit-Learn provides the GradientBoostingRegressor class. This class implements the Gradient Boosting Machine algorithm. It constructs an additive model by sequentially fitting decision trees, where each new tree is trained to correct the errors made by the combination of all prior trees.
The GradientBoostingRegressor is a powerful and flexible tool for a wide range of regression problems, from predicting house prices to forecasting demand. Its effectiveness comes from its ability to model complex, non-linear relationships in the data.
To get started, you import the class from sklearn.ensemble. Its instantiation is straightforward and will feel familiar if you have used other Scikit-Learn models.
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Create some synthetic data
X = np.random.rand(100, 1) * 10
y = np.sin(X).ravel() + np.random.normal(0, 0.3, 100)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Instantiate the model with default parameters
gbr = GradientBoostingRegressor(random_state=42)
# Fit the model to the training data
gbr.fit(X_train, y_train)
# Make predictions on the test set
y_pred = gbr.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
This code snippet demonstrates the standard workflow: instantiate, fit, and predict. While the default parameters often provide a reasonable starting point, understanding the main parameters is essential for building high-performing models.
The behavior of the GradientBoostingRegressor is controlled by several important parameters. Let's examine the ones you will adjust most frequently.
loss)The loss parameter defines the loss function to be optimized. The choice of loss function depends on the specifics of your regression problem, especially its sensitivity to outliers.
'ls': The default option, which stands for least squares regression. It minimizes the L2 loss, equivalent to the mean squared error (). This is a good general-purpose choice but can be sensitive to outliers.'lad': Least absolute deviation, which minimizes the L1 loss, equivalent to the mean absolute error (). This is more robust to outliers than least squares.'huber': A combination of least squares and least absolute deviation. It behaves like least squares for small errors and like least absolute deviation for larger errors, providing a balance of sensitivity and robustness.'quantile': Allows for quantile regression. Instead of predicting the mean, this loss function can be used to predict a specific quantile (e.g., the 50th percentile, which is the median).The interaction between n_estimators, learning_rate, and max_depth governs the model's ability to fit the training data without overfitting.
n_estimators: This parameter sets the number of boosting stages, which corresponds to the number of trees in the ensemble. More trees can capture more complex patterns, but too many can lead to overfitting.learning_rate: This parameter, often called shrinkage, scales the contribution of each tree. A smaller learning rate (e.g., 0.01) requires a larger n_estimators to achieve the same training error but often results in better generalization. It effectively slows down the learning process, preventing the model from making drastic corrections with each new tree.max_depth: This controls the maximum depth of the individual decision trees. Shallow trees (e.g., max_depth=3) are constrained and act as weak learners, which is central to the boosting process. Deeper trees can model more complex feature interactions but increase the risk of overfitting the training data.Let's build a model to fit a more complex, non-linear function and visualize its predictions. We will use a slightly more configured model to see the effect of changing parameters.
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
# Generate a non-linear dataset with noise
np.random.seed(0)
X = np.linspace(0, 6, 150)[:, np.newaxis]
y = X * np.sin(X).ravel() + np.random.normal(0, 0.5, 150)
# Instantiate and configure the model
gbr_tuned = GradientBoostingRegressor(
n_estimators=200, # More trees
learning_rate=0.05, # A smaller learning rate
max_depth=4, # Slightly deeper trees
loss='ls', # Standard least squares loss
random_state=42
)
# Fit the model
gbr_tuned.fit(X, y)
# Create a smooth line for prediction visualization
X_plot = np.linspace(0, 6, 500)[:, np.newaxis]
y_plot = gbr_tuned.predict(X_plot)
By setting a smaller learning_rate and a higher n_estimators, we encourage the model to learn the underlying pattern more gradually. The max_depth of 4 allows each tree to capture a moderate level of interaction. The visualization below shows how effectively the ensemble of simple trees has approximated the complex sine wave function.
The model's prediction (red line) closely follows the underlying pattern of the noisy training data (blue points), demonstrating its ability to learn complex, non-linear relationships.
Having built a regressor, the next logical steps involve understanding why it makes the predictions it does and how to handle classification problems. In the following sections, we will explore methods for interpreting these models and introduce its counterpart for classification, the GradientBoostingClassifier.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with