All Courses

Indexing Data for Efficient Retrieval

Okay, you've successfully loaded your data into LlamaIndex Document objects. But just having the raw data isn't enough for an LLM to efficiently use it. Imagine trying to find a specific sentence in a massive, unorganized pile of books versus looking it up in a well-cataloged library. Indexing is the process of building that library catalog for your data.

The goal of indexing in LlamaIndex is to structure your loaded data in a way that makes it fast and easy to find the most relevant pieces of information when you later pose a query. This relevant information is what you'll eventually feed to the LLM as context, enabling the Retrieval-Augmented Generation (RAG) process we'll build later.

From Documents to Nodes

LlamaIndex doesn't usually work with entire documents directly during the retrieval phase. Instead, it breaks down the Document objects into smaller, more manageable chunks called Nodes. Each Node typically represents a piece of text (like a paragraph or a few sentences) derived from the original Document, along with metadata linking it back to its source.

Why break documents into smaller pieces?

Granularity: Smaller chunks allow for more precise matching. When a user asks a question, you often only need a specific paragraph or sentence as context, not the entire document.
Efficiency: Processing and embedding smaller text chunks is computationally less intensive than working with large documents.
Context Window Limits: LLMs have limits on how much text (the context window) they can process at once. Feeding smaller, highly relevant chunks is more effective than overwhelming the model with lengthy, less relevant text.

LlamaIndex handles this chunking process automatically during indexing, although you can customize the chunking strategy if needed.

The Role of Embeddings

Once the data is broken into Nodes, LlamaIndex typically creates numerical representations of these nodes called embeddings. Embeddings are vectors (lists of numbers) generated by a machine learning model (an embedding model) that capture the semantic meaning of the text. Nodes with similar meanings will have embeddings that are mathematically close to each other in the vector space.

Think of it like assigning coordinates to each Node on a map, where Nodes discussing similar topics are located near each other. When you later query the index, LlamaIndex will embed your query into the same vector space and look for the Nodes whose embeddings are closest to the query embedding. This process is known as similarity search.

Common embedding models include those from OpenAI, Cohere, or open-source models available through libraries like Hugging Face's transformers or Sentence Transformers. LlamaIndex integrates with many of these, often using a default model if you don't specify one.

Creating an Index

With Nodes and their corresponding embeddings, LlamaIndex constructs an Index. The index is the data structure that organizes the Nodes and their embeddings to enable efficient querying.

Several types of indexes exist in LlamaIndex, but the most common and versatile one for similarity search is the VectorStoreIndex. This index stores the Node embeddings (often in a specialized database called a vector store) and allows for rapid searching to find the Nodes most semantically similar to a given query.

Let's see how to build a basic VectorStoreIndex. Assuming you have a list of Document objects loaded (as covered in the previous section, let's call it documents):

from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.openai import OpenAIEmbedding # Or another embedding model
# Example using OpenAI embeddings (requires OPENAI_API_KEY to be set)
# You can replace this with other embedding models LlamaIndex supports.
Settings.embed_model = OpenAIEmbedding()

# Create the index from your loaded documents
index = VectorStoreIndex.from_documents(documents)

print("Index created successfully!")

This simple command performs several steps behind the scenes:

It takes the input documents.
It parses them and splits them into Nodes based on default settings.
It generates embeddings for each Node using the configured embedding model (here, OpenAIEmbedding via Settings).
It stores these Nodes and their embeddings in an in-memory vector store structure, creating the VectorStoreIndex.

The diagram below illustrates this flow:

Data flow from raw documents to a queryable LlamaIndex index, involving loading, splitting into nodes, embedding, and storing in an index structure.

Now that the data is indexed, it's structured for efficient retrieval. The next step is to learn how to ask questions (query) against this index to find the relevant information needed by your LLM.

Persisting Indexes

Building an index, especially generating embeddings, can take time and computational resources, particularly for large datasets. You typically don't want to rebuild the index every time your application starts. LlamaIndex allows you to persist (save) your index to disk and load it back later.

from llama_index.core import StorageContext, load_index_from_storage

# Define a path where the index will be stored
PERSIST_DIR = "./storage"

# Save the index to disk
index.storage_context.persist(persist_dir=PERSIST_DIR)
print(f"Index saved to {PERSIST_DIR}")

# Load the index from disk later
# Create a default storage context pointing to the persist directory
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)

# Load the index (requires the same embedding model configuration)
loaded_index = load_index_from_storage(storage_context)
print("Index loaded successfully!")

By persisting the index, you separate the potentially time-consuming indexing process from the querying phase of your application, making startup much faster. We'll cover more advanced storage options and vector stores in the RAG chapter.

Was this section helpful?