Feature importance scores offer a high-level view of which features a Gradient Boosting Machine (GBM) uses. However, these scores only rank features by their predictive power and do not explain the nature of the relationship between a feature and the model's predictions. For instance, does an increase in a feature's value lead to a higher or lower prediction? Is the relationship linear, or is it more complex? Partial Dependence Plots (PDPs) are used to answer these specific questions.The "How" Behind the "What"A Partial Dependence Plot illustrates the marginal effect of one or two features on the predicted outcome of a model. In simple terms, it shows how the model's prediction changes, on average, as you vary the value of a feature while holding all other features constant. This allows you to isolate and visualize the relationship the model has learned between a specific input and its output.The calculation works by averaging out the effects of all other features in the dataset. For a chosen feature, the process is as follows:A grid of values is created for the feature of interest.For each value in the grid, that value is substituted for the feature in every single instance of the dataset.The model makes a prediction for each of these modified instances.The predictions are averaged across all instances.This average prediction is plotted against the feature value from the grid.The result is a line or surface that visualizes the expected prediction as a function of the feature(s) of interest.Generating PDPs with Scikit-LearnScikit-Learn provides a convenient and powerful tool for creating these plots within the sklearn.inspection module. The PartialDependenceDisplay class and its from_estimator method handle the entire calculation and plotting process.Let's start by training a GradientBoostingRegressor on the California housing dataset.from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.ensemble import GradientBoostingRegressor from sklearn.inspection import PartialDependenceDisplay import matplotlib.pyplot as plt # Load and prepare data housing = fetch_california_housing() X_train, X_test, y_train, y_test = train_test_split( housing.data, housing.target, test_size=0.2, random_state=42 ) feature_names = housing.feature_names # Train a GBM model gbm = GradientBoostingRegressor(n_estimators=100, max_depth=3, learning_rate=0.1, random_state=42) gbm.fit(X_train, y_train)With a trained model, creating a PDP for a single feature like the median income (MedInc) is straightforward.# Create and display the PDP for the 'MedInc' feature fig, ax = plt.subplots(figsize=(8, 6)) PartialDependenceDisplay.from_estimator( gbm, X_train, features=['MedInc'], # or using index: features=[0] feature_names=feature_names, ax=ax ) plt.show()Interpreting One-Way Partial Dependence PlotsThe code above generates a one-way PDP, which shows the relationship between a single feature and the model's prediction. The plot for MedInc would look something like this.{"layout":{"xaxis":{"title":"Median Income (MedInc)"},"yaxis":{"title":"Partial Dependence"},"title":"Partial Dependence of House Value on Median Income","template":"plotly_white","font":{"family":"sans-serif"}},"data":[{"x":[1.625,2.1666,2.7082,3.2498,3.7914,4.333,4.8746,5.4162,5.9578,6.4994,7.041,7.5826,8.1242,8.6658,9.2074,9.749,10.2906,10.8322,11.3738,11.9154,12.457,12.9986,13.5402,14.0818,14.6234],"y":[1.359,1.602,1.871,2.153,2.417,2.651,2.861,3.053,3.238,3.411,3.582,3.742,3.896,4.045,4.181,4.307,4.423,4.529,4.62,4.693,4.75,4.793,4.825,4.849,4.869],"mode":"lines","type":"scatter","line":{"color":"#228be6","width":3}}]}The average predicted house value increases steadily as the median income rises, but the effect begins to level off for very high incomes.From this chart, we can draw a clear conclusion that our GBM model has learned a positive, non-linear relationship between median income and house value. The predicted value rises sharply for incomes up to around 9, after which the marginal benefit of additional income diminishes. This is a far more descriptive explanation than simply stating that MedInc is the most important feature.Visualizing Feature Interactions with Two-Way PDPsPartial dependence is not limited to a single feature. By plotting two features simultaneously, we can visualize their interaction effect on the prediction. This helps answer questions like, "Does the effect of feature A depend on the value of feature B?"Let's examine the interaction between median income (MedInc) and the average number of rooms (AveRooms).# Create and display the two-way PDP fig, ax = plt.subplots(figsize=(9, 7)) PartialDependenceDisplay.from_estimator( gbm, X_train, features=['MedInc', 'AveRooms'], feature_names=feature_names, ax=ax ) plt.show()This generates a 2D heatmap where color represents the average prediction.{"layout":{"xaxis":{"title":"Median Income (MedInc)"},"yaxis":{"title":"Average Rooms (AveRooms)"},"title":"Interaction between Median Income and Average Rooms","template":"plotly_white","font":{"family":"sans-serif"}},"data":[{"x":[1.625,2.1666,2.7082,3.2498,3.7914,4.333,4.8746,5.4162,5.9578,6.4994,7.041,7.5826],"y":[3.479,4.108,4.737,5.366,5.995,6.624,7.253,7.882,8.511,9.14,9.769,10.398],"z":[[1.09,1.17,1.25,1.33,1.4,1.46,1.52,1.57,1.61,1.65,1.68,1.71],[1.44,1.54,1.63,1.72,1.8,1.87,1.94,2.0,2.05,2.09,2.12,2.15],[1.8,1.91,2.02,2.12,2.21,2.29,2.36,2.43,2.48,2.53,2.56,2.59],[2.12,2.25,2.37,2.49,2.59,2.68,2.76,2.83,2.9,2.95,2.99,3.02],[2.4,2.54,2.68,2.81,2.93,3.03,3.12,3.2,3.27,3.33,3.37,3.41],[2.63,2.79,2.95,3.1,3.22,3.34,3.44,3.53,3.61,3.67,3.72,3.76],[2.84,3.02,3.19,3.35,3.5,3.62,3.73,3.83,3.92,3.99,4.04,4.09],[3.03,3.22,3.41,3.59,3.75,3.89,4.01,4.12,4.22,4.3,4.36,4.41],[3.21,3.42,3.62,3.81,3.98,4.14,4.27,4.39,4.5,4.58,4.65,4.7],[3.38,3.6,3.81,4.01,4.2,4.36,4.5,4.64,4.75,4.84,4.91,4.97],[3.54,3.77,4.0,4.22,4.41,4.59,4.74,4.89,5.01,5.11,5.18,5.25],[3.7,3.95,4.19,4.42,4.62,4.81,4.98,5.13,5.26,5.36,5.44,5.52]],"type":"heatmap","colorscale":"YlOrRd","colorbar":{"title":"Predicted Value"}}]}The highest predicted house values (darkest red) occur when both median income and the average number of rooms are high.The heatmap shows that the highest predicted house values occur in the top-right corner, where both MedInc and AveRooms are large. It also reveals that an increase in the number of rooms has a stronger positive effect on price when median income is already high. This interaction provides a more complete picture of the model's logic than analyzing each feature in isolation.Important NotesWhile powerful, PDPs operate on an important assumption: that the features you are plotting are not correlated with other features in the model. The method works by changing the value of one feature while holding others constant. If two features are strongly correlated (e.g., age and years of experience), this process can create data points that are highly unrealistic or even impossible, leading to potentially misleading plots.Furthermore, a PDP shows the average effect across the entire dataset. It can sometimes mask more complex relationships where a feature affects different subsets of the data in different ways. For scenarios requiring instance-level explanations, more advanced techniques like Individual Conditional Expectation (ICE) plots can be used.Despite these limitations, Partial Dependence Plots are an indispensable tool for interpreting GBMs. They bridge the gap between knowing what features are important and understanding how they drive model predictions, turning a complex model into a more transparent and understandable one.