Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
BLEU: a Method for Automatic Evaluation of Machine Translation, Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu, 2002Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics)DOI: 10.3115/1073083.1073135 - Introduces the BLEU score, a foundational metric for automatic evaluation of machine translation, based on n-gram overlap.
ROUGE: A Package for Automatic Evaluation of Summaries, Chin-Yew Lin, 2004Text Summarization Branches Out (Association for Computational Linguistics)DOI: 10.3115/1621876.1621880 - Presents the ROUGE metric suite, widely used for evaluating the quality of summaries and other text generations by comparing them to reference texts.