Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Overlapping Experiment Infrastructure: More, Better, Faster Experimentation, Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike Meyer, 2010Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM)DOI: 10.1145/1835804.1835808 - Describes infrastructure and methodological considerations for efficiently running multiple concurrent A/B tests in a large-scale production setting.