Site Reliability Engineering: How Google Runs Production Systems, Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff, 2016 (O'Reilly Media) - Provides principles and practices for operating highly reliable and scalable distributed systems, with sections applicable to monitoring, incident response, and maintenance.