Okay, let's put theory into practice. You've set up monitoring, identified a performance dip in a specific data segment, maybe a drop in precision for users in a particular demographic group or recall slipping for a certain product category. Your alerts are firing, but why is it happening? This is where explainability techniques become powerful diagnostic tools, helping you move from knowing what is wrong to understanding why.
In this practice section, we'll simulate diagnosing a performance degradation issue using SHAP (SHapley Additive exPlanations), a popular technique for explaining individual predictions and overall model behavior. We assume you have a trained model artifact and access to logged production data, including features and predictions.
Imagine a churn prediction model where monitoring has detected a significant drop in recall for customers who recently interacted with a newly launched premium support channel. Overall recall might be stable, but this specific segment is performing poorly, meaning we are failing to identify customers likely to churn within this group.
Our goal is to use SHAP to understand which features are driving the predictions (correct or incorrect) for this specific segment and why they might be failing to capture the churn signal effectively compared to the past or other segments.
First, we need to gather the relevant data and load our model.
Load the Model: Load the production model artifact.
import joblib
# Assuming 'model.pkl' is your serialized model file
model = joblib.load('model.pkl')
Prepare Data: We need data specific to the segment exhibiting performance degradation.
data_segment_issue
: A Pandas DataFrame containing recent feature data and ground truth labels for customers who used the premium support channel during the period of degraded recall.data_segment_baseline
: A similar DataFrame from a period before the recall drop for the same segment, serving as a baseline for comparison.features
: A list of feature names used by the model.import pandas as pd
# Placeholder functions to represent loading your data
# Replace these with your actual data loading logic
def load_data_segment(period="issue"):
# Load features and ground truth ('churn') for the premium support segment
# from the specified period (e.g., 'issue' or 'baseline')
print(f"Loading data for premium support segment: {period} period...")
# Example structure:
data = pd.DataFrame({
'feature_A': [0.5, 0.1, 0.9] + ([0.6, 0.2] if period=="issue" else [0.4, 0.3]),
'feature_B': [10, 50, 20] + ([15, 45] if period=="issue" else [25, 35]),
'used_premium_support': [1, 1, 1, 1, 1], # Filtered segment
'new_feature_X': [0, 1, 0] + ([1, 1] if period=="issue" else [0, 0]),
'churn': [0, 1, 0] + ([1, 0] if period=="issue" else [1, 1]) # Example labels
})
# Ensure 'used_premium_support' is 1 for all rows in this segment data
data = data[data['used_premium_support'] == 1].drop(columns=['used_premium_support'])
# Make sure column order matches model training
all_features = ['feature_A', 'feature_B', 'new_feature_X'] # Example feature list
return data[all_features], data['churn']
features = ['feature_A', 'feature_B', 'new_feature_X'] # Define feature list
X_issue, y_issue = load_data_segment(period="issue")
X_baseline, y_baseline = load_data_segment(period="baseline") # Optional baseline
# Select a subset for background data (can be from training or baseline period)
# A smaller, representative sample is often sufficient
X_background = X_baseline.sample(n=min(100, len(X_baseline)), random_state=42)
Now, let's use the shap
library to compute and analyze explanations.
Initialize Explainer: We create a SHAP explainer suitable for our model type. For tree-based models (like XGBoost, LightGBM, RandomForest), shap.TreeExplainer
is efficient. For others, shap.KernelExplainer
is more general but slower. KernelExplainer
requires a background dataset to represent expected feature distributions.
import shap
# Using KernelExplainer as a general example
# For tree models, shap.TreeExplainer(model) might be faster
explainer = shap.KernelExplainer(model.predict_proba, X_background)
# For TreeExplainer (if applicable):
# explainer = shap.TreeExplainer(model)
Note: We pass model.predict_proba
to KernelExplainer
to get explanations for the probability output, typically more informative for diagnostics than the final class prediction. If using TreeExplainer
and it supports probabilities directly, use that; otherwise, explanations might be for the margin output.
Calculate SHAP Values: Compute SHAP values for the data segment experiencing issues. This tells us how much each feature contributed to pushing the prediction away from the average prediction for each instance.
# Calculate SHAP values for the issue period segment
# This can take time for KernelExplainer and large datasets
shap_values_issue = explainer.shap_values(X_issue)
# shap_values output structure depends on the explainer and model output.
# For binary classification with predict_proba, shap_values might be a list
# [shap_values_for_class_0, shap_values_for_class_1].
# We are usually interested in the SHAP values for the positive class (churn=1).
# Let's assume index 1 corresponds to the positive class (churn).
shap_values_pos_class = shap_values_issue[1] if isinstance(shap_values_issue, list) else shap_values_issue
# For TreeExplainer, the output might directly be for the positive class or margin.
# Check the SHAP documentation for your specific model/explainer.
Now, visualize and interpret the results to pinpoint the problem.
Global Importance (Summary Plot): Look at the overall feature importance within this segment. Has the importance hierarchy changed compared to the baseline or your expectations?
# Generate a summary plot (beeswarm style)
shap.summary_plot(shap_values_pos_class, X_issue, feature_names=features, show=False)
# In a real scenario, you would display this plot using matplotlib or integrate with a dashboard
Let's represent what a summary plot might look like using a simplified Plotly chart structure.
{"layout": {"title": "SHAP Summary Plot (Issue Segment - Churn=1)", "xaxis": {"title": "SHAP value (impact on model output for Churn=1)"}, "yaxis": {"title": "Feature", "automargin": true}, "margin": {"l": 100, "r": 20, "t": 50, "b": 40}}, "data": [{"type": "scatter", "x": [-0.8, -0.5, 0.1, 0.9, 1.5], "y": ["new_feature_X"]*5, "mode": "markers", "marker": {"color": [0, 1, 0, 1, 1], "colorscale": [[0, "#339af0"], [1, "#f06595"]], "showscale": true, "colorbar": {"title": "Feature Value<br>(Low to High)"}, "symbol": "circle"}, "name": "new_feature_X"}, {"type": "scatter", "x": [-0.6, -0.3, 0.2, 0.4, 0.7], "y": ["feature_A"]*5, "mode": "markers", "marker": {"color": [0.1, 0.2, 0.5, 0.6, 0.9], "symbol": "circle"}, "name": "feature_A", "showlegend": false}, {"type": "scatter", "x": [-0.2, -0.1, 0.1, 0.3, 0.5], "y": ["feature_B"]*5, "mode": "markers", "marker": {"color": [5, 10, 20, 45, 50], "symbol": "circle"}, "name": "feature_B", "showlegend": false}]}
Example SHAP summary plot for the affected segment. Each point is a Shapley value for a feature and an instance. Position on x-axis shows impact on predicting churn (higher values push towards churn). Color shows feature value (blue=low, red=high). Feature order indicates overall importance.
Interpretation: In this example, new_feature_X
has become highly important. High values (red) strongly push predictions towards churn (positive SHAP), while low values (blue) push against it. Compare this to a baseline summary plot. Did new_feature_X
previously have less impact? Are specific value ranges of feature_A
or feature_B
now behaving differently for this segment?
Local Explanation (Force Plot / Waterfall Plot): Examine individual instances, particularly the False Negatives (customers who churned but were predicted not to), since our problem was low recall.
# Find indices of False Negatives in the issue segment
predictions = model.predict(X_issue)
fn_indices = X_issue[(y_issue == 1) & (predictions == 0)].index
if not fn_indices.empty:
# Select one False Negative instance to investigate
idx_to_explain = fn_indices[0]
instance_loc = X_issue.index.get_loc(idx_to_explain)
# Generate a force plot for this instance (requires JS in notebooks/web)
# shap.force_plot(explainer.expected_value[1], shap_values_pos_class[instance_loc,:], X_issue.iloc[instance_loc,:], feature_names=features, show=False)
# Generate a waterfall plot (good alternative)
# shap.waterfall_plot(shap.Explanation(values=shap_values_pos_class[instance_loc,:],
# base_values=explainer.expected_value[1],
# data=X_issue.iloc[instance_loc,:].values,
# feature_names=features), show=False)
print(f"\nAnalyzing False Negative instance index: {idx_to_explain}")
print("Feature Contributions (SHAP values for predicting Churn=1):")
# Displaying values directly for clarity here:
contributions = pd.Series(shap_values_pos_class[instance_loc,:], index=features)
print(contributions.sort_values(ascending=False))
print(f"Base value (average prediction probability): {explainer.expected_value[1]:.4f}") # Assuming expected_value is available and [1] is for positive class
print(f"Final prediction probability: {explainer.expected_value[1] + contributions.sum():.4f}")
else:
print("\nNo False Negatives found in the provided sample to analyze.")
Interpretation: The force/waterfall plot (or the printed contributions) shows which feature values pushed the prediction towards or away from churn for that specific customer. For a False Negative, we expect the sum of SHAP values plus the base value to be below the classification threshold (e.g., 0.5). Identify the features that contributed most strongly against predicting churn (negative SHAP values). Is new_feature_X
having an unexpectedly negative impact for this customer, despite them actually churning? Does feature_A
's value, which might normally indicate churn risk, have a suppressed effect here? Analyzing several False Negatives can reveal patterns.
Dependence Plots: Investigate how the model's output depends on a specific feature's value, potentially colored by an interacting feature. This helps spot non-linear relationships or interaction effects specific to the segment.
# Example: Investigate 'new_feature_X' and its interaction with 'feature_A'
shap.dependence_plot("new_feature_X", shap_values_pos_class, X_issue, interaction_index="feature_A", show=False)
Interpretation: Does the dependence plot for the issue segment show a different pattern than expected or seen in the baseline data? For example, perhaps the positive impact of new_feature_X = 1
is significantly dampened when feature_A
is low, specifically within this segment, leading to missed churn predictions.
Based on the SHAP analysis:
new_feature_X
dominating), it might point to concept drift or issues related to that new feature (data quality, encoding).feature_A
having an unusually strong negative impact only when new_feature_X
is present), it suggests the model hasn't learned the interaction correctly for this segment.These diagnostic insights are far more actionable than just knowing recall dropped. They might suggest:
new_feature_X
and related features.Integrating explainability tools like SHAP into your monitoring and incident response workflow provides essential diagnostic capabilities, enabling faster, more targeted interventions when model performance deviates in production.
© 2025 ApX Machine Learning