Theory provides the map, but implementation is where you truly learn the terrain. Building a Gradient Boosting Machine involves translating its algorithm's logic into working Python code. This exercise is designed to solidify your understanding of how GBMs learn iteratively. A production-ready library will not be built; instead, a simplified GBM for a regression task will be constructed to see the mechanics in action.
Our weak learners will be shallow decision trees, specifically the DecisionTreeRegressor from Scikit-Learn. By focusing on the boosting procedure itself, you will see exactly how these simple models combine to form a powerful and accurate predictor.
First, let's prepare our workspace. We need numpy for numerical computations and matplotlib to visualize our results. Most importantly, we'll import DecisionTreeRegressor to serve as our weak learner.
We will generate a simple, non-linear dataset based on a sine wave. This gives us a clear target function to see how well our model learns.
import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
# Generate a synthetic dataset
np.random.seed(42)
X = np.linspace(0, 6, 100)[:, np.newaxis]
y = np.sin(X).ravel() + np.random.normal(0, 0.2, 100)
# Plot the data to see what we're working with
plt.figure(figsize=(10, 6))
plt.scatter(X, y, c='#495057', s=20, label='Data Points')
plt.plot(X, np.sin(X), color='#f03e3e', linewidth=2, label='True Function (sin(x))')
plt.title('Synthetic Regression Dataset')
plt.xlabel('Feature (x)')
plt.ylabel('Target (y)')
plt.legend()
plt.show()
We will implement the core logic of GBM for a regression task using Mean Squared Error (MSE) as the loss function. As we learned, the negative gradient of the MSE loss function, L(y,F)=21(y−F)2, is simply the residual, y−F.
The first step is to create an initial prediction. For MSE, the optimal constant prediction that minimizes the loss is the mean of the target variable. This will be our starting point, F0(x).
# Initial prediction is the mean of the target variable
initial_prediction = np.mean(y)
Now we enter the main loop of the algorithm. For each iteration, we perform three actions:
Let's define our model's hyperparameters.
# Hyperparameters
n_estimators = 100
learning_rate = 0.1
max_depth = 1 # Shallow trees are weak learners
# Store the trees and the current predictions
trees = []
F = np.full(y.shape, initial_prediction) # F represents our ensemble's prediction
for _ in range(n_estimators):
# 1. Compute residuals
residuals = y - F
# 2. Fit a weak learner to the residuals
tree = DecisionTreeRegressor(max_depth=max_depth, random_state=42)
tree.fit(X, residuals)
# 3. Update the ensemble's prediction
prediction_from_tree = tree.predict(X)
F += learning_rate * prediction_from_tree
# Store the trained tree
trees.append(tree)
In this loop, F represents the cumulative prediction of the ensemble at each stage. Notice how each new tree is not trained on y, but on the residuals. It learns to predict the error of the current ensemble, and we add a small fraction of its prediction back to our main prediction F.
To make a prediction on new data, we follow the same process. We start with the initial prediction (the mean) and then sequentially add the scaled predictions from each tree in our ensemble.
def predict(X_new):
# Start with the initial constant prediction
prediction = np.full(X_new.shape[0], initial_prediction)
# Add the predictions from each tree
for tree in trees:
prediction += learning_rate * tree.predict(X_new)
return prediction
# Generate predictions on our original data to see how we did
y_pred = predict(X)
The most effective way to understand what we've built is to visualize its output. The following chart shows the original data points, the true function we were trying to model, our simple initial prediction, and the final, much more sophisticated prediction from our custom GBM.
The model starts with a simple average and iteratively refines its prediction. Each step corrects the errors of the previous one, gradually learning the underlying sinusoidal pattern from the noisy data points.
As you can see, our model went from a naive horizontal line to a sophisticated curve that closely follows the true function. It achieved this by stringing together 100 very simple decision trees (stumps, in this case), each one correcting the lingering errors from the ones that came before it.
You have now built a Gradient Boosting Machine. While libraries like Scikit-Learn and XGBoost provide highly optimized, feature-rich implementations, the core principle is precisely what you have just coded. This hands-on experience provides an invaluable foundation as we move on to using and tuning these powerful, pre-built libraries in the next chapter.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with