BERTScore: Evaluating Text Generation with BERT, Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, Yoav Artzi, 2020International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1904.09675 - Introduces a widely used metric for evaluating text generation based on semantic similarity using contextual embeddings, relevant for automated evaluation.
LangChain Evaluation, LangChain Team, 2024 (LangChain) - Official documentation for a popular LLM framework's evaluation capabilities, demonstrating practical implementation of automated testing workflows.