Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems, Vol. 30 (Curran Associates, Inc.)DOI: 10.48550/arXiv.1706.03762 - This foundational paper introduces the Transformer architecture, which forms the basis of modern LLMs. It elucidates how these models process input in a stateless, per-request manner.
Speech and Language Processing (3rd Edition Draft), Daniel Jurafsky and James H. Martin, 2025 - A comprehensive textbook on natural language processing and computational linguistics, with significant chapters dedicated to dialogue systems and the crucial aspects of managing conversational state and context.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 2020Advances in Neural Information Processing Systems 33 (NeurIPS 2020), Vol. 33 (Neural Information Processing Systems Foundation, Inc. (NeurIPS))DOI: 10.48550/arXiv.2005.11401 - This paper presents Retrieval-Augmented Generation (RAG), a method for extending language models by retrieving information from an external knowledge base. This offers a strategy to manage and extend context beyond the limitations of the model's inherent context window, relevant to the conversation history problem.
A Survey of Large Language Models, Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen, 2023DOI: 10.48550/arXiv.2303.18223 - This comprehensive survey reviews the recent advancements and existing challenges in large language models. It covers various aspects, including the architectural design that leads to their stateless nature and the techniques used to build conversational applications.