Python toolkit for building production-ready LLM applications. Modular utilities for prompts, RAG, agents, structured outputs, and multi-provider support.
Was this section helpful?
Neural Machine Translation of Rare Words with Subword Units, Rico Sennrich, Barry Haddow, and Alexandra Birch, 2016Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1 (Association for Computational Linguistics)DOI: 10.18653/v1/P16-1162 - Introduces Byte Pair Encoding (BPE) for subword tokenization, addressing rare words and out-of-vocabulary issues in neural machine translation.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Vol. 1 (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1423 - Presents the BERT model, which utilizes WordPiece tokenization, illustrating the application of subword methods in large language model pre-training.