Now that we've explored the concepts behind LIME, including its reliance on perturbation and local surrogate models, let's put this knowledge into practice. In this hands-on exercise, we will use the LIME Python library to generate and interpret explanations for individual predictions made by a machine learning model. We'll use a familiar dataset and a standard classifier to see how LIME helps us understand why the model makes a particular decision for a specific input.
First, ensure you have the necessary libraries installed. You'll primarily need scikit-learn
for modeling and lime
for the explanations. You can typically install them using pip:
pip install scikit-learn lime numpy pandas matplotlib
Let's import the required modules and load a dataset. We'll use the Iris dataset, a common benchmark for classification tasks. We'll also train a simple Random Forest classifier, which will serve as our "black-box" model to explain.
import numpy as np
import sklearn
import sklearn.datasets
import sklearn.ensemble
import lime
import lime.lime_tabular
import pandas as pd
# Load the Iris dataset
iris = sklearn.datasets.load_iris()
feature_names = iris.feature_names
class_names = iris.target_names
# Create a pandas DataFrame for easier viewing (optional)
iris_df = pd.DataFrame(iris.data, columns=feature_names)
iris_df['target'] = iris.target
iris_df['target_names'] = iris_df['target'].map({i: name for i, name in enumerate(class_names)})
# print("Iris Dataset Features:", feature_names)
# print("Iris Dataset Classes:", class_names)
# print(iris_df.head())
# Train a RandomForestClassifier
# We use integers for random_state for reproducibility
model = sklearn.ensemble.RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(iris.data, iris.target)
print("Model trained successfully.")
# Example: Check model accuracy (optional)
# accuracy = model.score(iris.data, iris.target)
# print(f"Model Training Accuracy: {accuracy:.2f}")
With our model trained, the next step is to initialize a LIME explainer object specifically designed for tabular data. The LimeTabularExplainer
requires information about our training data to generate meaningful perturbations.
# Create the LIME explainer object
explainer = lime.lime_tabular.LimeTabularExplainer(
training_data=iris.data, # The data used for training the model
feature_names=feature_names, # List of feature names
class_names=class_names, # List of class names
mode='classification' # Specify 'classification' or 'regression'
)
print("LIME Tabular Explainer created.")
Here's a breakdown of the parameters:
training_data
: This numpy array is used by LIME to understand the distribution of feature values. Perturbations are generated based on statistics (like mean and standard deviation) derived from this data.feature_names
: Providing the actual names makes the explanations much easier to read.class_names
: For classification problems, this allows LIME to label the outputs clearly.mode
: Tells LIME whether this is a 'classification'
or 'regression'
problem, which affects how it expects the model's prediction function to behave and how it presents results.Now, let's select an instance from the dataset we want to explain. We'll pick the 55th instance (index 54) in the Iris dataset, which corresponds to a 'versicolor' iris. We then need to provide LIME's explain_instance
method with:
predict_proba
method of your scikit-learn classifier.# Choose an instance to explain (e.g., the 55th instance, index 54)
instance_index = 54
instance_to_explain = iris.data[instance_index]
actual_class = class_names[iris.target[instance_index]]
predicted_class_index = model.predict(instance_to_explain.reshape(1, -1))[0]
predicted_class = class_names[predicted_class_index]
print(f"Explaining instance index: {instance_index}")
print(f"Instance Features: {instance_to_explain}")
print(f"Actual Class: {actual_class}")
print(f"Model Predicted Class: {predicted_class}")
# Define the prediction function LIME needs
# It takes a numpy array (n_samples, n_features) and returns (n_samples, n_classes) probabilities
predict_fn = model.predict_proba
# Generate the explanation
explanation = explainer.explain_instance(
data_row=instance_to_explain,
predict_fn=predict_fn,
num_features=len(feature_names) # Explain using all features
)
print("\nExplanation generated.")
The explain_instance
method works behind the scenes by:
instance_to_explain
.predict_fn
.The explanation
object contains the results. A common way to view it is using as_list()
, which provides the feature importance weights for the predicted class.
# Get the explanation as a list of (feature, weight) tuples
explanation_list = explanation.as_list()
print("\nLIME Explanation (Feature Contributions):")
for feature, weight in explanation_list:
print(f"- {feature}: {weight:.4f}")
# You can also visualize the explanation directly in notebooks
# explanation.show_in_notebook(show_table=True)
# Or generate a plot (requires matplotlib)
# fig = explanation.as_pyplot_figure()
# fig.tight_layout() # Adjust layout
# fig.show() # Display the plot
The output list shows features and their corresponding weights. For classification, positive weights indicate features that push the prediction towards the predicted class ('versicolor' in our example), while negative weights push away from it (towards other classes). The magnitude of the weight suggests the strength of the contribution for this specific instance.
For example, you might see output like:
LIME Explanation (Feature Contributions):
- petal width (cm) <= 1.30: 0.2134
- 4.90 < petal length (cm) <= 5.10: 0.1987
- sepal width (cm) <= 2.80: -0.0712
- sepal length (cm) > 6.70: -0.0123
This suggests that for this particular flower, having a petal width less than or equal to 1.30 cm and a petal length between 4.90 cm and 5.10 cm strongly supports the 'versicolor' classification. Conversely, the sepal width and length values slightly push against this prediction. LIME often discretizes continuous features for tabular data (as seen in the conditions like <= 1.30
), making the local linear model easier to fit and interpret.
We can also create a simple bar chart to visualize these contributions:
Feature contributions towards the predicted class ('versicolor') for instance 54. Positive bars (green) support the prediction, negative bars (red) oppose it. The length indicates the magnitude of the contribution.
This visualization clearly shows the positive influence of the petal measurements and the smaller negative influence of the sepal measurements for this specific prediction.
In this hands-on section, you successfully applied LIME to explain an individual prediction from a Random Forest classifier trained on the Iris dataset. You learned how to:
LimeTabularExplainer
.explain_instance
.Remember that LIME provides local explanations. Explaining a different instance might yield different feature importance rankings and weights, reflecting how the model uses features differently across the input space. This local fidelity is the core strength of LIME when dealing with complex models.
© 2025 ApX Machine Learning