Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing SystemsDOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture, explaining attention mechanisms, feed-forward networks, and other core components fundamental to modern LLMs and their fine-tuning.
Parameter-Efficient Transfer Learning for NLP, Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly, 2019International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.1902.00751 - Introduces adapter modules as a parameter-efficient method for fine-tuning, by inserting small, task-specific neural network layers into pre-trained models.
LoRA: Low-Rank Adaptation of Large Language Models, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2021International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.2106.09685 - Proposes Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning technique that significantly reduces trainable parameters by updating a low-rank decomposition of weight matrices.