LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen, 2021arXiv preprint arXiv:2106.09685DOI: 10.48550/arXiv.2106.09685 - Introduces Low-Rank Adaptation (LoRA), a prominent parameter-efficient fine-tuning technique that significantly reduces memory and computational costs for training large language models.
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Samyam Rajbhandari, Cong Li, Zhun Liu, Kshitij Kumar, and Yuxiong He, 2020SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE)DOI: 10.1109/SC20-501.2020.00018 - Presents ZeRO (Zero Redundancy Optimizer), a family of memory optimization technologies essential for distributed training of massive deep learning models by sharding model states across devices.
Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn, 2023arXiv preprint arXiv:2305.18290DOI: 10.48550/arXiv.2305.18290 - Introduces Direct Preference Optimization (DPO), an alternative algorithm to PPO for fine-tuning language models with human preferences, which simplifies the training process by removing the explicit reward model.