ChromaDB Documentation, Chroma, 2024 (Chroma) - Official documentation for the open-source embedding database. It provides comprehensive guides for installation, configuration, and practical usage of ChromaDB's API for managing collections and performing similarity searches.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, 2018Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Vol. 1DOI: 10.48550/arXiv.1810.04805 - Presents the BERT model, a landmark in pre-trained contextual text embeddings. This work is fundamental for understanding how text is transformed into high-dimensional vectors that capture semantic meaning, enabling semantic search in vector stores.
Natural Language Processing with Transformers: Building Innovative Applications with 🤗Transformers, Lewis Tunstall, Leandro von Werra, Thomas Wolf, 2022 (O'Reilly Media) - This book offers an in-depth explanation of modern NLP techniques, including how transformer-based models generate vector embeddings. It provides context for using these embeddings in applications like RAG and understanding the role of vector stores.