Now that we understand the intuition and mechanics behind LIME, let's put it into practice using Python. The primary library for LIME implementation is aptly named lime
. It provides tools to explain individual predictions of classifiers and regressors trained using libraries like scikit-learn, TensorFlow, Keras, PyTorch, and others.
First, ensure you have the lime
library installed. You can install it using pip:
pip install lime
You will also need standard data science libraries like numpy
and scikit-learn
for data handling and modeling.
The most common use case involves explaining predictions for models trained on structured, tabular data. The lime
library provides the LimeTabularExplainer
class specifically for this purpose.
Let's walk through an example using the familiar Iris dataset and a scikit-learn RandomForestClassifier.
Assume you have your data loaded into NumPy arrays or Pandas DataFrames and have trained a classifier.
import numpy as np
import sklearn
import sklearn.ensemble
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import lime
import lime.lime_tabular
# Load and prepare data
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
class_names = iris.target_names
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a model (e.g., RandomForest)
model = sklearn.ensemble.RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
print(f"Training accuracy: {model.score(X_train, y_train):.2f}")
print(f"Test accuracy: {model.score(X_test, y_test):.2f}")
# Choose an instance to explain from the test set
instance_idx = 0
instance_to_explain = X_test[instance_idx]
actual_class = class_names[y_test[instance_idx]]
predicted_class = class_names[model.predict(instance_to_explain.reshape(1, -1))[0]]
print(f"\nInstance to explain (Index: {instance_idx}): {instance_to_explain}")
print(f"Actual Class: {actual_class}")
print(f"Model Predicted Class: {predicted_class}")
The next step is to create an instance of LimeTabularExplainer
. This object needs information about your training data to understand the distribution of feature values for generating perturbations.
# Create the LIME explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
training_data=X_train, # Data LIME uses to generate perturbations
feature_names=feature_names, # List of feature names
class_names=class_names, # List of class names
mode='classification' # Specify 'classification' or 'regression'
# Optional: discretize_continuous=True, verbose=True, etc.
)
Key parameters for LimeTabularExplainer
:
training_data
: A NumPy array representing the training dataset. LIME uses this to calculate feature statistics (mean, std dev) for perturbation and discretization. Providing the actual training data is common, but a representative sample can also work.feature_names
: A list of strings corresponding to the column names of your data.class_names
: A list of strings representing the names of the target classes.mode
: Set to 'classification'
or 'regression'
depending on your model type.LIME needs access to your model's prediction function. Crucially, this function must take a NumPy array of perturbed samples (where each row is a sample) and return the model's prediction probabilities (for classification) or predicted values (for regression) as a NumPy array.
For scikit-learn classifiers, the predict_proba
method usually provides this.
# Define the prediction function LIME will use
# It takes perturbed data (numpy array) and returns probabilities (numpy array)
predict_fn = lambda x: model.predict_proba(x)
Make sure the output shape is correct: (num_samples, num_classes)
for classification or (num_samples,)
for regression.
Now, call the explain_instance
method on your explainer object.
# Explain the chosen instance
explanation = explainer.explain_instance(
data_row=instance_to_explain, # The instance you want to explain
predict_fn=predict_fn, # The prediction function defined above
num_features=len(feature_names), # Max number of features in the explanation
num_samples=5000 # Number of samples to generate for perturbation
# Optional: top_labels=1 (to explain only the top predicted class)
)
Key parameters for explain_instance
:
data_row
: The specific instance (as a 1D NumPy array) you want to explain.predict_fn
: The function wrapper created in the previous step.num_features
: The maximum number of features to include in the explanation. LIME ranks features by importance and returns the top ones.num_samples
: The number of perturbed samples LIME generates around the data_row
to train the local surrogate model. Higher values can lead to more stable explanations but increase computation time.The explanation
object returned by explain_instance
contains the local explanation. You can access it in several ways:
As a list: explanation.as_list()
returns a list of tuples, where each tuple contains (feature_description, weight)
. The weight indicates the feature's contribution to the prediction for the specific class being explained (by default, the predicted class). Positive weights support the prediction, negative weights oppose it.
# Get explanation as a list for the predicted class
explanation_list = explanation.as_list()
print(f"\nExplanation for prediction '{predicted_class}':")
for feature, weight in explanation_list:
print(f"- {feature}: {weight:.4f}")
Visualizations: LIME offers convenient plotting functions.
explanation.show_in_notebook()
: Renders an HTML visualization directly in Jupyter environments.explanation.as_pyplot_figure()
: Returns a matplotlib figure object for customization or saving.Let's create a simple bar chart visualization of the feature contributions using Plotly, mimicking the kind of plot you might get from LIME's built-in functions.
Example visualization of LIME feature contributions for a single Iris prediction. Features with positive weights (green) increase the probability of the predicted class, while features with negative weights (red) decrease it. The rules (e.g., "petal width (cm) <= 1.70") are derived from LIME's internal discretization or based on the perturbed samples around the instance.
The visualization clearly shows which feature values (often represented as rules like feature <= value
or value < feature <= value2
if continuous features are discretized) pushed the prediction towards the predicted class (positive weights) and which pushed against it (negative weights) for this specific instance.
The process for explaining regression models is very similar. The main differences are:
mode='regression'
when creating the LimeTabularExplainer
.predict_fn
should return the actual predicted values (a 1D NumPy array) instead of probabilities. For scikit-learn regressors, this typically means using the model.predict
method.# Example for regression (conceptual)
# Assuming 'reg_model' is a trained scikit-learn regressor
# explainer_reg = lime.lime_tabular.LimeTabularExplainer(..., mode='regression', class_names=['TargetValue'])
# predict_fn_reg = lambda x: reg_model.predict(x)
# explanation_reg = explainer_reg.explain_instance(instance_to_explain, predict_fn_reg, num_features=5)
# explanation_reg.show_in_notebook() # Or use other methods
predict_fn
accepts a NumPy array and returns probabilities/values in the correct shape is often the trickiest part. Double-check its behavior.categorical_features
and categorical_names
arguments in the LimeTabularExplainer
constructor for proper handling during perturbation.num_samples
can improve stability but takes longer.num_samples
, can be computationally intensive.This section provided a practical guide to implementing LIME for tabular data using its Python library. By creating an explainer, defining the prediction function, and calling explain_instance
, you can generate local, interpretable explanations for your black-box model predictions, gaining valuable insights into why a specific decision was made. The next chapter will introduce SHAP, another powerful technique with different theoretical underpinnings.
© 2025 ApX Machine Learning