LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021International Conference on Learning Representations (ICLR) - Introduces LoRA, a parameter-efficient fine-tuning method frequently evaluated against in the context of this section.
SuperGLUE: A Stronger General Language Understanding Evaluation Benchmark, Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel Bowman, 2019Advances in Neural Information Processing Systems 32, Vol. 32 (NeurIPS Proceedings)DOI: 10.48550/arXiv.1905.00537 - Presents SuperGLUE, a widely used benchmark for evaluating natural language understanding models, encompassing various NLU tasks and metrics.
BLEU: a Method for Automatic Evaluation of Machine Translation, Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, 2002Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL) (Association for Computational Linguistics (ACL))DOI: 10.3115/1073083.1073135 - Foundational paper introducing BLEU, a standard automatic metric for evaluating the quality of machine-translated text.
ROUGE: A Package for Automatic Evaluation of Summaries, Chin-Yew Lin, 2004Text Summarization Branches Out: Proceedings of the ACL-04 Workshop (Association for Computational Linguistics)DOI: 10.3115/1614008.1614022 - Introduces ROUGE, a widely adopted set of metrics for automatically evaluating summarization and other text generation tasks.