Let's bring together the concepts we've covered by building a small, functional semantic search application. This exercise integrates an embedding model, a vector database client, and a simple web framework, demonstrating a complete, albeit basic, search pipeline from user query to relevant results. We'll use components that are straightforward to set up locally, allowing you to focus on the interaction between them.For this practical, we will use:Embedding Model: sentence-transformers library with a pre-trained model like all-MiniLM-L6-v2. This model is efficient and provides good quality embeddings for sentences and short paragraphs.Vector Database: ChromaDB. We'll use its Python client for local, persistent storage, which simplifies setup for this example.Web Framework: FastAPI. A modern Python web framework that's easy to use and automatically generates interactive API documentation.1. Setup and DependenciesFirst, ensure you have the necessary libraries installed. You can install them using pip:pip install sentence-transformers chromadb fastapi uvicorn[standard] python-multipart Jinja2sentence-transformers: For loading the embedding model and generating vectors.chromadb: The client library for interacting with the Chroma vector database.fastapi: The web framework for creating our API endpoint.uvicorn: An ASGI server to run our FastAPI application.python-multipart: Required by FastAPI for handling form data (though we might use JSON).Jinja2: Used by FastAPI for optional HTML templating if needed, often included with FastAPI's dependencies.2. Data Preparation and Indexing ScriptLet's create a script (index_data.py) to prepare some sample data, generate embeddings, and index them into ChromaDB.# index_data.py import chromadb from sentence_transformers import SentenceTransformer # --- Configuration --- MODEL_NAME = 'all-MiniLM-L6-v2' COLLECTION_NAME = "docs_collection" PERSIST_DIRECTORY = "./chroma_db_persist" # Directory to store DB data # --- Sample Data --- # Simple list of documents (sentences in this case) documents = [ "The quick brown fox jumps over the lazy dog.", "Artificial intelligence is transforming many industries.", "Vector databases are optimized for similarity search.", "Natural language processing enables computers to understand text.", "The capital of France is Paris.", "Apples are a type of fruit, often red or green.", "Machine learning algorithms learn from data.", "Semantic search provides results based on meaning, not just keywords.", ] # --- Initialization --- print("Initializing embedding model...") # Load the pre-trained sentence transformer model # This model maps sentences & paragraphs to a 384 dimensional dense vector space # It will download the model automatically if not present model = SentenceTransformer(MODEL_NAME) print("Initializing ChromaDB client...") # Initialize ChromaDB client with persistence # This will save the database state to the specified directory client = chromadb.PersistentClient(path=PERSIST_DIRECTORY) print(f"Getting or creating collection: {COLLECTION_NAME}") # Get or create the collection. If it exists, it will be loaded. # Specify the embedding function based on our SentenceTransformer model collection = client.get_or_create_collection( name=COLLECTION_NAME, embedding_function=chromadb.utils.embedding_functions.SentenceTransformerEmbeddingFunction(model_name=MODEL_NAME) # You can also explicitly pass metadata={'hnsw:space': 'cosine'} if needed, # but SentenceTransformerEmbeddingFunction often defaults appropriately. ) # --- Indexing --- print("Generating IDs and preparing data for indexing...") # Generate simple sequential IDs for this example doc_ids = [f"doc_{i}" for i in range(len(documents))] # Check if data needs indexing (simple check based on expected count) # In a real app, you might have a more way to track indexed data if collection.count() < len(documents): print(f"Indexing {len(documents)} documents...") try: # Add documents to the collection # ChromaDB's SentenceTransformerEmbeddingFunction handles embedding generation automatically here collection.add( documents=documents, ids=doc_ids ) print("Documents indexed successfully.") except Exception as e: print(f"Error indexing documents: {e}") else: print("Documents seem to be already indexed.") print(f"Collection '{COLLECTION_NAME}' now contains {collection.count()} documents.") print("Indexing script finished.") Explanation:We define our sample documents and configuration parameters.We load the SentenceTransformer model. The first time you run this, it will download the model weights.We initialize a PersistentClient for ChromaDB, specifying a directory (./chroma_db_persist) where the database files will be stored. This makes our index persistent across runs.We use client.get_or_create_collection. This is convenient because it either creates the collection if it doesn't exist or loads the existing one if it does. We associate our SentenceTransformer model with the collection via embedding_function. ChromaDB will use this function automatically when we add documents or perform queries.We generate simple unique IDs (doc_ids) for each document.We add the documents and their corresponding IDs to the collection using collection.add. Because we configured an embedding_function, ChromaDB calls the model internally to get the vectors for each document before storing them. We include a basic check to avoid re-indexing every time.Run this script once to populate your local ChromaDB:python index_data.pyYou should see output indicating initialization and successful indexing, and a chroma_db_persist directory will be created.3. Building the Search API with FastAPINow, let's create the web application (main.py) that will serve our search requests.# main.py import chromadb from fastapi import FastAPI, Query, HTTPException from sentence_transformers import SentenceTransformer import uvicorn # For running the app # --- Configuration --- MODEL_NAME = 'all-MiniLM-L6-v2' COLLECTION_NAME = "docs_collection" PERSIST_DIRECTORY = "./chroma_db_persist" N_RESULTS = 3 # Number of search results to return # --- Application Initialization --- app = FastAPI( title="Simple Semantic Search API", description="An API that uses a vector database for semantic search.", version="0.1.0" ) # --- Global Variables / Resources --- # Initialize resources once when the application starts try: print("Loading embedding model...") embedding_model = SentenceTransformer(MODEL_NAME) print("Model loaded successfully.") print("Connecting to ChromaDB...") db_client = chromadb.PersistentClient(path=PERSIST_DIRECTORY) collection = db_client.get_collection(name=COLLECTION_NAME) # Verify collection has items (optional but good practice) if collection.count() == 0: print(f"Warning: Collection '{COLLECTION_NAME}' is empty. Did you run index_data.py?") print("ChromaDB connection successful.") except Exception as e: print(f"Error during initialization: {e}") # Handle initialization failure appropriately, maybe exit or raise specific error embedding_model = None collection = None # --- API Endpoints --- @app.get("/search/") async def perform_search( query: str = Query(..., min_length=3, description="The search query text.") ): """ Performs semantic search on the indexed documents. Takes a query string, generates its embedding, and searches the vector database for the most similar documents. """ if not embedding_model or not collection: raise HTTPException(status_code=503, detail="Search service is not available due to initialization error.") print(f"Received query: '{query}'") try: # 1. Generate embedding for the query print("Generating query embedding...") query_embedding = embedding_model.encode(query).tolist() print("Query embedding generated.") # 2. Query the vector database print(f"Querying collection '{COLLECTION_NAME}'...") results = collection.query( query_embeddings=[query_embedding], # Note: query_embeddings expects a list of embeddings n_results=N_RESULTS, include=['documents', 'distances'] # Ask ChromaDB to return documents and distances ) print("Query executed successfully.") # 3. Format and return results # The results structure can be a bit nested, let's simplify it if results and results.get('ids') and results['ids'][0]: formatted_results = [] ids = results['ids'][0] distances = results['distances'][0] documents = results['documents'][0] for i in range(len(ids)): formatted_results.append({ "id": ids[i], "document": documents[i], "distance": distances[i] # Lower distance means more similar for cosine/euclidean }) return {"results": formatted_results} else: return {"results": []} # Return empty list if no results found except Exception as e: print(f"Error during search for query '{query}': {e}") raise HTTPException(status_code=500, detail=f"Search failed: {str(e)}") @app.get("/") async def read_root(): """ A simple root endpoint to check if the API is running. """ return {"message": "Semantic Search API is running. Use the /search/ endpoint."} # --- Main Execution --- # This block allows running the app directly using `python main.py` if __name__ == "__main__": print("Starting FastAPI server...") uvicorn.run(app, host="0.0.0.0", port=8000)Explanation:We initialize FastAPI.Global Resources: We load the SentenceTransformer model and connect to the persistent ChromaDB collection once when the application starts. This avoids reloading the model or reconnecting to the DB on every request, which would be very inefficient. Error handling is added for robustness./search/ Endpoint:It accepts a query parameter (a string).It generates the embedding for the input query using the same model we used for indexing. This is important for comparing vectors meaningfully.It uses collection.query to find the N_RESULTS most similar document embeddings to the query_embedding. We request that ChromaDB includes the original documents and distances in the response.It formats the results from ChromaDB into a cleaner list of dictionaries and returns them as JSON.Error handling is included for potential issues during embedding generation or database querying.Root Endpoint: A simple / endpoint confirms the API is running.Running the App: The if __name__ == "__main__": block allows you to run the server directly using python main.py. Alternatively, you can use uvicorn main:app --reload --host 0.0.0.0 --port 8000. The --reload flag is useful during development as it automatically restarts the server when you save changes.4. Running and TestingIndex the Data: If you haven't already, run python index_data.py.Start the API Server: Run uvicorn main:app --reload --port 8000.Test the API: Open your web browser or use a tool like curl to send requests to the search endpoint:Browser: Navigate to http://localhost:8000/search/?query=what+is+AIcurl:curl "http://localhost:8000/search/?query=information%20about%20databases"curl "http://localhost:8000/search/?query=tell%20me%20about%20animals"You should receive JSON responses containing the most relevant documents from your small dataset based on semantic similarity, along with their distances. For example, querying about "databases" should return results related to vector databases and possibly machine learning. Querying about "animals" should retrieve the sentence about the fox.Search Application Flow Diagramdigraph SemanticSearchFlow { rankdir=LR; node [shape=box, style=rounded, fontname="sans-serif", color="#495057", fontcolor="#495057"]; edge [fontname="sans-serif", color="#adb5bd", fontcolor="#495057"]; subgraph cluster_api { label = "FastAPI Application"; bgcolor="#e9ecef"; style=filled; color="#ced4da"; api_endpoint [label="/search Endpoint", shape=ellipse, style=filled, fillcolor="#a5d8ff"]; query_embed [label="Generate Query\nEmbedding"]; db_query [label="Query Vector DB"]; format_results [label="Format Results"]; } user [label="User / Client", shape=circle, style=filled, fillcolor="#b2f2bb"]; model [label="Sentence Transformer\n(all-MiniLM-L6-v2)", style=filled, fillcolor="#ffec99"]; vector_db [label="ChromaDB Collection\n(docs_collection)", shape=cylinder, style=filled, fillcolor="#fcc2d7"]; user -> api_endpoint [label="1. Sends query string"]; api_endpoint -> query_embed [label="2. Passes query"]; query_embed -> model [label="3. Gets embedding"]; model -> query_embed [label="4. Returns vector"]; query_embed -> db_query [label="5. Passes vector"]; db_query -> vector_db [label="6. Executes ANN Search"]; vector_db -> db_query [label="7. Returns results (IDs, Distances, Docs)"]; db_query -> format_results [label="8. Passes results"]; format_results -> api_endpoint [label="9. Returns formatted JSON"]; api_endpoint -> user [label="10. Sends JSON response"]; }The diagram illustrates the request flow for the semantic search application. A user sends a query to the API endpoint, which uses the embedding model to convert the query into a vector. This vector is then used to search the ChromaDB collection for similar document vectors. The results are formatted and returned to the user.Further StepsThis example provides a basic structure. You could extend it in many ways:Larger Dataset: Index a more substantial dataset (e.g., articles, product descriptions). Remember the indexing strategies discussed earlier for efficiency.Metadata Filtering: Add metadata (e.g., categories, timestamps) during indexing and use ChromaDB's filtering capabilities (where clauses in the query method) to refine search results.Different Databases/Models: Swap ChromaDB for Pinecone, Weaviate, or Milvus by modifying the client initialization and query logic according to their respective Python clients. Experiment with different embedding models.User Interface: Build a simple HTML frontend using Jinja2 templates with FastAPI or a separate frontend framework (like React, Vue) that interacts with this API.Hybrid Search: Integrate keyword search (e.g., using Whoosh or Elasticsearch) alongside vector search for potentially improved relevance.Evaluation: Implement evaluation metrics (like Recall@K) using a labeled dataset to measure the quality of your search results.This hands-on exercise demonstrates how the components discussed throughout this course, embedding models, vector databases, and search logic, come together to create applications that understand the meaning behind user queries.