Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Christoph Molnar, 2024 (Self-published (available online)) - Provides a comprehensive overview of model interpretability, covering various methods and explicitly discussing challenges such as the accuracy-interpretability trade-off, faithfulness, and evaluation.
Why Should I Trust You? Explaining the Predictions of Any Classifier, Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin, 2016Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM)DOI: 10.1145/2939672.2939778 - Introduces LIME, a local model-agnostic explanation method, which highlights the challenge of local faithfulness for post-hoc explanations.
A Unified Approach to Interpreting Model Predictions, Scott Lundberg, Su-In Lee, 2017Advances in Neural Information Processing Systems 30, Vol. 30DOI: 10.48550/arXiv.1705.07874 - Presents SHAP, a method for explaining model predictions based on Shapley values, addressing issues like computational expense for exact solutions and feature dependencies.