Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing Systems (NIPS) 30DOI: 10.48550/arXiv.1706.03762 - This foundational paper introduces the Transformer architecture and the self-attention mechanism, which is the core subject of the section.
torch.nn.MultiheadAttention, PyTorch Documentation, 2024 (PyTorch Foundation) - Official documentation for the PyTorch layer used in the code example for extracting attention weights.
Speech and Language Processing (3rd ed. draft), Daniel Jurafsky, James H. Martin, 2025 (Stanford University) - A widely recognized textbook that offers a detailed account of Transformers, attention mechanisms, and their analysis in NLP, covering theory and practical aspects.