Beyond simply comparing aggregate performance metrics like accuracy or AUC, understanding why a model performs the way it does is often significant. A critical aspect of machine learning utility is whether a model trained on synthetic data learns similar underlying patterns and feature relationships as a model trained on the corresponding real data. If the synthetic data leads the model to rely on entirely different features or assign vastly different weights compared to the real data, its practical utility might be limited, even if overall performance metrics seem acceptable. Assessing feature importance consistency helps us gauge this alignment.
Feature importance quantifies the contribution of each input feature to a model's predictions. Common methods include:
Our goal isn't to re-explain these methods but to use their outputs to compare models trained on real versus synthetic data. The core idea follows the Train-Synthetic-Test-Real (TSTR) principle, but instead of just evaluating predictions on the real test set, we also analyze the learned feature importances.
The most straightforward approach involves these steps:
Several comparison techniques can be employed:
Rank Correlation: Calculate the correlation between the rankings of features based on their importance scores. Spearman's Rho or Kendall's Tau are suitable metrics. A high rank correlation (close to 1) indicates that both models prioritize features similarly.
ρ=1−n(n2−1)6∑di2Spearman's Rank Correlation Coefficient formula, where di is the difference between the ranks of feature i in the two models, and n is the number of features.
Value Comparison (Scatter Plot): Create a scatter plot where each point represents a feature. The x-coordinate is its importance in Model R, and the y-coordinate is its importance in Model S. Features close to the y=x line indicate good agreement in importance magnitude.
Scatter plot comparing the importance scores of features from a model trained on real data versus one trained on synthetic data. Points near the dashed diagonal line indicate consistent importance.
Consistency is key when performing this comparison.
sklearn.ensemble.RandomForestClassifier
) for both real and synthetic data.sklearn.inspection.permutation_importance
with the same settings) to both models.Here's a simplified Python snippet using scikit-learn's permutation importance and SciPy for rank correlation:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from scipy.stats import spearmanr
from sklearn.model_selection import train_test_split
# Assume X_real, y_real, X_synth are pre-loaded
# Split real data for training Model R
X_real_train, X_real_test, y_real_train, y_real_test = train_test_split(
X_real, y_real, test_size=0.3, random_state=42
)
# 1 & 2: Train on Real, Get Importance R
model_r = RandomForestClassifier(n_estimators=100, random_state=42)
model_r.fit(X_real_train, y_real_train)
perm_importance_r = permutation_importance(
model_r, X_real_test, y_real_test, n_repeats=10, random_state=42, n_jobs=-1
)
importance_r = perm_importance_r.importances_mean
# 3 & 4: Train on Synthetic, Get Importance S
# Assume X_synth has the same features as X_real
# Note: We use the *same* real test set for evaluation consistency
model_s = RandomForestClassifier(n_estimators=100, random_state=42) # Same model, same hypers
model_s.fit(X_synth, y_real_train) # Fit on synthetic data
perm_importance_s = permutation_importance(
model_s, X_real_test, y_real_test, n_repeats=10, random_state=42, n_jobs=-1
)
importance_s = perm_importance_s.importances_mean
# 5: Compare
# Rank Correlation
spearman_corr, p_value = spearmanr(importance_r, importance_s)
print(f"Spearman Rank Correlation: {spearman_corr:.3f}")
# Top-K Overlap (e.g., K=5)
k = 5
top_k_indices_r = np.argsort(importance_r)[-k:]
top_k_indices_s = np.argsort(importance_s)[-k:]
overlap = len(set(top_k_indices_r) & set(top_k_indices_s))
print(f"Overlap in Top-{k} Features: {overlap}/{k}")
# (Visualization code for scatter plot would go here)
Assessing feature importance consistency provides a deeper layer of utility evaluation than looking at performance metrics alone. It helps build confidence that the synthetic data not only allows for accurate predictions but also reflects the salient characteristics and relationships present in the original data. However, remember that feature importance methods have their own assumptions and limitations, so interpret these consistency results as valuable relative comparisons rather than absolute truths about feature relevance.
© 2025 ApX Machine Learning