Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NIPS) 30DOI: 10.48550/arXiv.1706.03762 - Introduces the Transformer architecture and the self-attention mechanism, core to how modern LLMs process context.
Deep Learning, Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering deep learning concepts, including sequence modeling and neural network architectures relevant to context processing.
Models overview, OpenAI, 2024 (OpenAI) - Provides technical specifications and practical details on context window limits and token handling for widely used LLM models.