Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
Neural Machine Translation of Rare Words with Subword Units, Rico Sennrich, Barry Haddow, and Alexandra Birch, 2016Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Association for Computational Linguistics)DOI: 10.18653/v1/P16-1162 - Introduces Byte-Pair Encoding (BPE) for subword tokenization, a foundational method used in many modern language models like GPT-2, directly relevant to the subword examples discussed.
Tokenizers in the transformers library, Hugging Face team, 2024 - Official documentation for tokenizers within the Hugging Face transformers library, explaining how different tokenization algorithms are loaded and used, directly relevant to the Python code example.