Applying interpretability techniques like LIME and SHAP to classification models provides valuable insights into why a model predicts a specific category for a given input. Since classification deals with discrete outcomes (e.g., 'spam' vs. 'not spam', 'cat' vs. 'dog' vs. 'bird'), explanations focus on identifying the features that push the prediction towards one class over the others. This chapter has already contrasted LIME and SHAP conceptually; now let's see how their explanations manifest specifically in classification tasks.
LIME generates local explanations by fitting a simpler, interpretable model (like linear regression) to perturbed versions of a single data point. For classification, this local surrogate model is trained to predict the probability of the class assigned by the original complex model.
The output typically consists of a list of features and their corresponding weights. These weights represent the feature's contribution to the prediction for a specific class in the local vicinity of the instance being explained.
Consider a binary classification model predicting customer churn ('Yes' or 'No'). If LIME explains a prediction of 'Yes' for a specific customer, the output might look like this:
Feature Weight
----------------------
NumSupportCalls > 3 0.45
TenureMonths < 6 0.30
ContractType = M2M 0.15
InternetService = No -0.10
TotalCharges < 50 -0.25
This suggests that, locally around this customer's data point, having more than 3 support calls, less than 6 months of tenure, and a Month-to-Month contract are the primary drivers pushing the prediction towards 'Yes' (Churn). Conversely, not having internet service and low total charges slightly push the prediction away from 'Yes'.
Most LIME implementations allow you to specify which class prediction you want to explain. You can generate separate explanations for why the model predicted class 'A', or why it didn't predict class 'B', providing a more complete picture.
Here's a visual representation of feature contributions for a specific class prediction using LIME:
Feature contributions towards the 'Churn = Yes' prediction for a single customer, as estimated by LIME. Positive values support the prediction, negative values oppose it.
SHAP values, grounded in Shapley values from game theory, explain how much each feature contributes to pushing the model's output away from a baseline (average) prediction towards the actual prediction for a specific instance. In classification, the "output" often refers to the model's raw score or log-odds for a particular class before the final probability conversion (e.g., via sigmoid or softmax).
SHAP provides several visualization tools tailored for classification:
These plots visualize the explanation for a single prediction. They show the baseline prediction value (the average prediction across the dataset) and how each feature's SHAP value pushes the output higher or lower towards the final prediction for the chosen class. Features pushing the prediction higher (often towards the positive class or the specific class of interest) are typically shown in one color (e.g., blue), while those pushing it lower are shown in another (e.g., red).
Summary plots provide a global view of feature importance. For classification, they often display SHAP values for each feature across many instances.
Global feature importance based on mean absolute SHAP values for predicting 'Churn = Yes'. Higher values indicate features with generally larger impact on the prediction.
These plots show how the value of a single feature affects its SHAP value (and thus its impact on the prediction for a specific class) across all instances. They can also automatically highlight interaction effects with another feature.
Impact of 'TenureMonths' on the SHAP value for the 'Churn = Yes' prediction. Lower tenure generally corresponds to positive SHAP values (increasing churn likelihood), while higher tenure corresponds to negative SHAP values (decreasing churn likelihood).
Both LIME and SHAP naturally extend to multi-class classification problems. The key is that explanations are generated per class. When you request an explanation for an instance where the model predicts, say, 'Class C', you need to specify which class's prediction you want to understand.
It's often informative to generate explanations not just for the predicted class, but also for the next most likely class(es). Comparing these explanations helps understand why the model preferred one category over another. For instance, seeing which features strongly supported 'Class A' while simultaneously opposing 'Class B' can be very revealing.
As discussed earlier in the chapter, SHAP offers theoretical advantages like consistency (a feature's impact direction won't wrongly flip) and often provides more reliable global importance measures via summary plots. Its foundation in Shapley values provides a solid theoretical backing. TreeSHAP is particularly efficient for tree-based models.
LIME, while lacking the guarantees of SHAP, can be faster to compute for individual predictions, especially when using KernelSHAP (the model-agnostic SHAP variant) which can be computationally intensive. LIME's focus on local fidelity via simple surrogate models can be intuitive for understanding a single prediction in isolation.
For classification, interpreting the outputs of both methods requires understanding that the explanations are typically class-specific. Whether using LIME weights or SHAP values (visualized through force, summary, or dependence plots), you are examining feature contributions towards the prediction of a particular category. Use these tools to dissect your classifier's reasoning, build confidence in its predictions, and identify potential biases or areas for model improvement.
© 2025 ApX Machine Learning