BLEU: a Method for Automatic Evaluation of Machine Translation, Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, 2002Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics)DOI: 10.3115/1073083.1073135 - The foundational paper introducing BLEU, a widely used metric for evaluating machine translation and other text generation tasks based on n-gram overlap.
ROUGE: A Package for Automatic Evaluation of Summaries, Chin-Yew Lin, 2004Text Summarization Branches Out (Association for Computational Linguistics)DOI: 10.3115/1621896.1621901 - The original paper presenting ROUGE, a set of metrics commonly used for evaluating summarization quality, focusing on recall based on n-gram and longest common subsequence overlap.