The previous section introduced Shapley values from cooperative game theory, a method for fairly distributing a "payout" among cooperating players. The SHAP (SHapley Additive exPlanations) framework cleverly adapts this concept to the context of machine learning model explanations.
Imagine trying to explain a single prediction made by your trained model for a specific input instance. How much did each feature of that instance contribute to the final prediction value, compared to some baseline? SHAP answers this by framing the problem as a cooperative game:
The goal is to fairly distribute this total payout (f(x)−E[f(X)]) among the "players" (the features). A feature's contribution, its SHAP value, represents how much that specific feature's value, in the context of other feature values for that instance, shifted the prediction.
Formally, the SHAP value ϕi(x) for feature i and instance x is calculated as the weighted average of its marginal contribution across all possible subsets (coalitions) of features not including feature i. The "marginal contribution" of feature i to a specific coalition S (where S is a subset of features excluding i) is the difference in the model's expected output when feature i is added to the coalition versus the expected output with the coalition S alone:
E[f(X)∣XS∪{xi}]−E[f(X)∣XS]
Here, E[f(X)∣XS] represents the expected prediction given only the feature values for the features in subset S. Calculating this conditional expectation is often the most complex part and requires making assumptions about feature independence or using specific calculation techniques, which we'll cover when discussing KernelSHAP and TreeSHAP.
SHAP values inherit a highly desirable property from Shapley values called additivity. This means that the sum of the SHAP values for all features of a given instance x equals the difference between the prediction for that instance f(x) and the base value (the average prediction E[f(X)]). We can express this relationship as:
f(x)=ϕ0+i=1∑Mϕi(x)where:
This equation provides a clear decomposition of the prediction. ϕ0 gives us the starting point (the average prediction), and each ϕi(x) tells us how the value of feature i pushes the prediction higher (if ϕi(x) is positive) or lower (if ϕi(x) is negative) relative to this base value. The magnitude ∣ϕi(x)∣ indicates the strength of the feature's influence on this specific prediction.
Unlike LIME, which builds a local, interpretable surrogate model, SHAP directly calculates feature contributions based on solid game-theoretic principles, aiming for a more theoretically grounded attribution. However, computing these exact contributions across all possible feature coalitions is computationally infeasible for all but the simplest models and smallest feature sets. This computational challenge motivates the development of efficient approximation algorithms like KernelSHAP and TreeSHAP, which we will explore next.
© 2025 ApX Machine Learning