Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - 这篇基础论文介绍了Transformer架构,它是现代大型语言模型(LLM)的基础。它阐明了这些模型如何以无状态、每次请求独立的方式处理输入。
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Vol. 33 (Neural Information Processing Systems Foundation, Inc. (NeurIPS))DOI: 10.48550/arXiv.2005.11401 - 本文介绍了检索增强生成(RAG),这是一种通过从外部知识库检索信息来扩展语言模型的方法。它提供了一种超越模型固有上下文窗口限制来管理和扩展上下文的策略,与对话历史问题相关。
A Survey of Large Language Models, Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, 2023DOI: 10.48550/arXiv.2303.18223 - 这篇综合性综述回顾了大型语言模型(LLM)的最新进展和现有挑战。它涵盖了多个方面,包括导致LLM无状态的架构设计,以及用于构建对话应用程序的技术。