Feature importance scores offer a high-level view of which features a Gradient Boosting Machine (GBM) uses. However, these scores only rank features by their predictive power and do not explain the nature of the relationship between a feature and the model's predictions. For instance, does an increase in a feature's value lead to a higher or lower prediction? Is the relationship linear, or is it more complex? Partial Dependence Plots (PDPs) are used to answer these specific questions.
A Partial Dependence Plot illustrates the marginal effect of one or two features on the predicted outcome of a model. In simple terms, it shows how the model's prediction changes, on average, as you vary the value of a feature while holding all other features constant. This allows you to isolate and visualize the relationship the model has learned between a specific input and its output.
The calculation works by averaging out the effects of all other features in the dataset. For a chosen feature, the process is as follows:
The result is a line or surface that visualizes the expected prediction as a function of the feature(s) of interest.
Scikit-Learn provides a convenient and powerful tool for creating these plots within the sklearn.inspection module. The PartialDependenceDisplay class and its from_estimator method handle the entire calculation and plotting process.
Let's start by training a GradientBoostingRegressor on the California housing dataset.
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import PartialDependenceDisplay
import matplotlib.pyplot as plt
# Load and prepare data
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
housing.data, housing.target, test_size=0.2, random_state=42
)
feature_names = housing.feature_names
# Train a GBM model
gbm = GradientBoostingRegressor(n_estimators=100, max_depth=3, learning_rate=0.1, random_state=42)
gbm.fit(X_train, y_train)
With a trained model, creating a PDP for a single feature like the median income (MedInc) is straightforward.
# Create and display the PDP for the 'MedInc' feature
fig, ax = plt.subplots(figsize=(8, 6))
PartialDependenceDisplay.from_estimator(
gbm,
X_train,
features=['MedInc'], # or using index: features=[0]
feature_names=feature_names,
ax=ax
)
plt.show()
The code above generates a one-way PDP, which shows the relationship between a single feature and the model's prediction. The plot for MedInc would look something like this.
The average predicted house value increases steadily as the median income rises, but the effect begins to level off for very high incomes.
From this chart, we can draw a clear conclusion that our GBM model has learned a positive, non-linear relationship between median income and house value. The predicted value rises sharply for incomes up to around 9, after which the marginal benefit of additional income diminishes. This is a far more descriptive explanation than simply stating that MedInc is the most important feature.
Partial dependence is not limited to a single feature. By plotting two features simultaneously, we can visualize their interaction effect on the prediction. This helps answer questions like, "Does the effect of feature A depend on the value of feature B?"
Let's examine the interaction between median income (MedInc) and the average number of rooms (AveRooms).
# Create and display the two-way PDP
fig, ax = plt.subplots(figsize=(9, 7))
PartialDependenceDisplay.from_estimator(
gbm,
X_train,
features=['MedInc', 'AveRooms'],
feature_names=feature_names,
ax=ax
)
plt.show()
This generates a 2D heatmap where color represents the average prediction.
The highest predicted house values (darkest red) occur when both median income and the average number of rooms are high.
The heatmap shows that the highest predicted house values occur in the top-right corner, where both MedInc and AveRooms are large. It also reveals that an increase in the number of rooms has a stronger positive effect on price when median income is already high. This interaction provides a more complete picture of the model's logic than analyzing each feature in isolation.
While powerful, PDPs operate on an important assumption: that the features you are plotting are not correlated with other features in the model. The method works by changing the value of one feature while holding others constant. If two features are strongly correlated (e.g., age and years of experience), this process can create data points that are highly unrealistic or even impossible, leading to potentially misleading plots.
Furthermore, a PDP shows the average effect across the entire dataset. It can sometimes mask more complex relationships where a feature affects different subsets of the data in different ways. For scenarios requiring instance-level explanations, more advanced techniques like Individual Conditional Expectation (ICE) plots can be used.
Despite these limitations, Partial Dependence Plots are an indispensable tool for interpreting GBMs. They bridge the gap between knowing what features are important and understanding how they drive model predictions, turning a complex model into a more transparent and understandable one.
Was this section helpful?
from_estimator.© 2026 ApX Machine LearningEngineered with