LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021arXivDOI: 10.48550/arXiv.2106.09685 - Introduces LoRA, a method that adapts pre-trained models by injecting trainable low-rank matrices into the transformer layers, significantly reducing the number of trainable parameters and memory footprint.
Parameter-Efficient Transfer Learning for NLP, Neil Houlsby, Andrei Giurgiu, Stanislau Padolski, Quentin de Latour, Max Vladutu, Albert Verga, Quincy Hatcliff, Jason Riesa, Anna Schiff, Shauna Horn, Melvin Johnson, George Dahl, Orhan Firat, 2019Proceedings of the 36th International Conference on Machine Learning (ICML), Vol. 97 - Proposes adapter modules, small neural network layers inserted between transformer layers, enabling efficient fine-tuning with a small number of additional parameters.
Prefix-Tuning: Optimizing Continuous Prompts for Generation, Xiang Lisa Li, Percy Liang, 2021Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol. Volume 1: Long Papers (Association for Computational Linguistics)DOI: 10.18653/v1/2021.acl-long.353 - Introduces prefix-tuning, which adds a small sequence of trainable vectors (the prefix) to the input of each transformer layer, freezing the base model and reducing trainable parameters.
The Power of Scale for Parameter-Efficient Prompt Tuning, Brian Lester, Rami Al-Rfou, Noah Constant, 2021Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics)DOI: 10.18653/v1/2021.emnlp-main.243 - Explores prompt tuning, where only a small set of trainable tokens (soft prompts) are optimized, achieving performance comparable to full fine-tuning for large models with minimal parameter overhead.