Okay, let's translate the theory of Gradient Boosting Machines into practice. In this section, we'll use Scikit-learn's implementation (GradientBoostingRegressor
and GradientBoostingClassifier
) to build and train basic GBM models. While libraries like XGBoost and LightGBM offer significant performance and feature enhancements (which we will cover later), understanding the Scikit-learn version provides a solid, accessible foundation directly linked to the concepts discussed in this chapter, such as the additive nature, loss functions, shrinkage, and subsampling.
We'll walk through setting up, training, and evaluating a GBM for both a regression and a classification task.
First, ensure you have the necessary libraries installed. We'll primarily use Scikit-learn, along with Pandas for data handling and NumPy for numerical operations.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, roc_auc_score
from sklearn.datasets import fetch_california_housing, load_breast_cancer
import matplotlib.pyplot as plt
import seaborn as sns
# Set a consistent style for plots
sns.set_style("whitegrid")
Let's tackle a regression problem using the California Housing dataset. Our goal is to predict the median house value based on various features.
Load and Prepare Data: We load the dataset and split it into training and testing sets.
# Load data
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training features shape: {X_train.shape}")
print(f"Testing features shape: {X_test.shape}")
Instantiate and Train the Model: We create an instance of GradientBoostingRegressor
. Let's examine some important parameters derived from our theoretical discussion:
n_estimators
: The number of boosting stages (trees) to perform. This corresponds to M in our additive model formulation FM(x)=∑m=1Mγmhm(x).learning_rate
: This is the shrinkage parameter ν. It scales the contribution of each tree. Smaller values require more trees (n_estimators
) for comparable performance but often improve generalization.loss
: Specifies the loss function to optimize. The default 'squared_error'
(or 'ls'
) corresponds to minimizing the sum of squared differences between actual and predicted values, where the negative gradient is simply the residual yi−Fm−1(xi). Other options like 'absolute_error'
(robust to outliers) or 'huber'
(a combination) are available.max_depth
: Controls the maximum depth of individual regression estimators (trees). This is a primary way to control model complexity and prevent overfitting.subsample
: If less than 1.0, this enables Stochastic Gradient Boosting by fitting trees on a random fraction of the training data. This introduces randomness and acts as a regularizer. Values around 0.8 are common.# Instantiate the GBM Regressor
gbr = GradientBoostingRegressor(
n_estimators=100, # Number of trees
learning_rate=0.1, # Shrinkage factor
max_depth=3, # Max depth of each tree
subsample=0.8, # Fraction of samples for fitting each tree
loss='squared_error',
random_state=42
)
# Train the model
print("Training GradientBoostingRegressor...")
gbr.fit(X_train, y_train)
print("Training complete.")
Make Predictions and Evaluate: We use the trained model to predict on the test set and evaluate performance using Mean Squared Error (MSE) and R-squared (R2).
# Predict on the test set
y_pred_reg = gbr.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred_reg)
r2 = r2_score(y_test, y_pred_reg)
print(f"Test Set Mean Squared Error: {mse:.4f}")
print(f"Test Set R-squared: {r2:.4f}")
You should observe reasonable performance metrics. Experimenting with n_estimators
, learning_rate
, and max_depth
will significantly impact these results. For instance, increasing n_estimators
while decreasing learning_rate
often yields better models, though it increases training time.
Now, let's apply GBM to a binary classification problem using the Breast Cancer Wisconsin dataset.
Load and Prepare Data:
# Load data
cancer = load_breast_cancer()
X_c = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y_c = cancer.target
# Split data
X_c_train, X_c_test, y_c_train, y_c_test = train_test_split(X_c, y_c, test_size=0.2, random_state=42, stratify=y_c)
print(f"Training classification features shape: {X_c_train.shape}")
print(f"Testing classification features shape: {X_c_test.shape}")
Instantiate and Train the Model: We use GradientBoostingClassifier
. Key parameters are similar to the regressor, but the loss
function is different.
loss
: The default 'log_loss'
(formerly 'deviance'
) is suitable for binary and multiclass classification, optimizing the logistic loss function. The negative gradient in this case involves probabilities. 'exponential'
uses the AdaBoost exponential loss function.# Instantiate the GBM Classifier
gbc = GradientBoostingClassifier(
n_estimators=100,
learning_rate=0.1,
max_depth=3,
subsample=0.8,
loss='log_loss',
random_state=42
)
# Train the model
print("Training GradientBoostingClassifier...")
gbc.fit(X_c_train, y_c_train)
print("Training complete.")
Make Predictions and Evaluate: We evaluate using standard classification metrics like Accuracy and ROC AUC score. We can also get probability estimates using predict_proba
.
# Predict on the test set
y_pred_class = gbc.predict(X_c_test)
y_pred_proba = gbc.predict_proba(X_c_test)[:, 1] # Probability of positive class
# Evaluate the model
accuracy = accuracy_score(y_c_test, y_pred_class)
roc_auc = roc_auc_score(y_c_test, y_pred_proba)
print(f"Test Set Accuracy: {accuracy:.4f}")
print(f"Test Set ROC AUC Score: {roc_auc:.4f}")
Again, tuning hyperparameters is essential for optimal performance.
Gradient Boosting models provide an estimate of feature importance based on how much each feature contributes to reducing the loss function across all trees. Scikit-learn provides this through the feature_importances_
attribute.
# Get feature importances for the regression model
importances_reg = gbr.feature_importances_
feature_names_reg = X.columns
importance_df_reg = pd.DataFrame({'Feature': feature_names_reg, 'Importance': importances_reg})
importance_df_reg = importance_df_reg.sort_values(by='Importance', ascending=False)
# Plot feature importances
plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=importance_df_reg.head(10), palette='viridis') # Plot top 10
plt.title('Top 10 Feature Importances (GBM Regressor)')
plt.xlabel('Importance Score')
plt.ylabel('Feature')
plt.tight_layout()
plt.show()
# Get feature importances for the classification model
importances_cls = gbc.feature_importances_
feature_names_cls = X_c.columns
importance_df_cls = pd.DataFrame({'Feature': feature_names_cls, 'Importance': importances_cls})
importance_df_cls = importance_df_cls.sort_values(by='Importance', ascending=False)
# Plot feature importances
plt.figure(figsize=(10, 8))
sns.barplot(x='Importance', y='Feature', data=importance_df_cls.head(10), palette='magma') # Plot top 10
plt.title('Top 10 Feature Importances (GBM Classifier)')
plt.xlabel('Importance Score')
plt.ylabel('Feature')
plt.tight_layout()
plt.show()
Feature importance plots for the regression (California Housing) and classification (Breast Cancer) tasks, derived from the trained Scikit-learn GBM models. These plots show the relative contribution of each feature in the model's decisions.
This hands-on exercise demonstrates the core workflow of applying GBM using Scikit-learn. You instantiated models, configured primary hyperparameters linked to GBM theory (number of estimators, learning rate, tree depth, subsampling), trained them, and evaluated their performance.
Keep in mind that Scikit-learn's GradientBoostingRegressor
and GradientBoostingClassifier
are highly valuable for understanding the algorithm's mechanics but may not be the most performant options for large datasets or complex scenarios. They lack some of the advanced regularization techniques, optimized split-finding algorithms, and efficient handling of sparse or categorical data found in libraries like XGBoost, LightGBM, and CatBoost.
Consider this exercise a stepping stone. You now have a practical understanding of how a standard GBM operates. In the following chapters, we will build upon this foundation, exploring the specialized algorithms that have become the workhorses of modern gradient boosting applications.
© 2025 ApX Machine Learning