Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
Site Reliability Engineering: How Google Runs Production Systems, Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy, 2016 (O'Reilly Media) - Offers core SRE principles and practices, foundational for managing production systems, including incident response and postmortems.
The Site Reliability Workbook: Practical Ways to Implement SRE, Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne, 2018 (O'Reilly Media) - Provides practical advice and examples for implementing SRE, covering incident management, on-call, and operational documentation.