OpenTelemetry Documentation, The OpenTelemetry Authors, 2025 - The official documentation for the OpenTelemetry standard, which is essential for implementing distributed tracing and observability in modern applications.
Site Reliability Engineering: How Google Runs Production Systems, Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff, 2017 (O'Reilly Media) - A seminal book on operating large-scale distributed systems, with extensive sections on monitoring, logging, and tracing principles for operational excellence.
Best practices for logging, Google Cloud Documentation Authors, 2024 (Google Cloud) - Provides practical recommendations and guidelines for effective structured logging in cloud-native and distributed environments.