Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Christoph Molnar, 2023 - A comprehensive online book offering a structured overview of machine learning interpretability methods, including various classification dimensions, local/global explanations, and model-specific/agnostic techniques.
"Why Should I Trust You?": Explaining the Predictions of Any Classifier, Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin, 2016Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery)DOI: 10.1145/2939672.2939778 - The original paper introducing LIME, a widely adopted post-hoc and model-agnostic method that generates local, interpretable explanations for individual predictions of any classifier.
A Unified Approach to Interpreting Model Predictions, Scott M Lundberg, Su-In Lee, 2017Advances in Neural Information Processing Systems 30 (NIPS 2017) (NeurIPS) - The seminal paper presenting SHAP, a game-theoretic explanation framework that unifies various interpretability methods and provides consistent feature attributions, including model-agnostic (KernelSHAP) and model-specific (TreeSHAP) implementations.