Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing SystemsDOI: 10.48550/arXiv.1706.03762 - 介绍了Transformer架构,解释了注意力机制、前馈网络以及构成现代LLM及其微调基础的核心组件。
Parameter-Efficient Transfer Learning for NLP, Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly, 2019International Conference on Machine Learning (ICML)DOI: 10.48550/arXiv.1902.00751 - 引入了适配器模块作为参数高效的微调方法,通过在预训练模型中插入小型、特定任务的神经网络层。