BLEU: a Method for Automatic Evaluation of Machine Translation, Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, 2002Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics)DOI: 10.3115/1073083.1073135 - Introduces the BLEU score, a widely used metric for evaluating machine translation and text generation quality.
BERTScore: Evaluating Text Generation with BERT, Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1904.09675 - Introduces BERTScore, an embedding-based metric that leverages pre-trained contextual embeddings for assessing text generation quality more semantically.