A Unified Approach to Interpreting Model Predictions, Scott M. Lundberg, Su-In Lee, 2017Advances in Neural Information Processing Systems (NeurIPS), Vol. 30DOI: 10.48550/arXiv.1705.07874 - Introduces SHAP values as a unified framework for interpreting predictions, which helps address issues like fairness, debugging, and user confidence.
Fairness and Machine Learning: Limitations and Opportunities, Solon Barocas, Moritz Hardt, and Arvind Narayanan, 2023 (MIT Press) - Provides a comprehensive overview of fairness in machine learning, covering how models can learn biases and the importance of interpretability for auditing and mitigating them.