Okay, you've successfully loaded your data into LlamaIndex Document
objects. But just having the raw data isn't enough for an LLM to efficiently use it. Imagine trying to find a specific sentence in a massive, unorganized pile of books versus looking it up in a well-cataloged library. Indexing is the process of building that library catalog for your data.
The goal of indexing in LlamaIndex is to structure your loaded data in a way that makes it fast and easy to find the most relevant pieces of information when you later pose a query. This relevant information is what you'll eventually feed to the LLM as context, enabling the Retrieval-Augmented Generation (RAG) process we'll build later.
LlamaIndex doesn't usually work with entire documents directly during the retrieval phase. Instead, it breaks down the Document
objects into smaller, more manageable chunks called Nodes
. Each Node
typically represents a piece of text (like a paragraph or a few sentences) derived from the original Document
, along with metadata linking it back to its source.
Why break documents into smaller pieces?
LlamaIndex handles this chunking process automatically during indexing, although you can customize the chunking strategy if needed.
Once the data is broken into Nodes
, LlamaIndex typically creates numerical representations of these nodes called embeddings. Embeddings are vectors (lists of numbers) generated by a machine learning model (an embedding model) that capture the semantic meaning of the text. Nodes with similar meanings will have embeddings that are mathematically close to each other in the vector space.
Think of it like assigning coordinates to each Node
on a map, where Nodes
discussing similar topics are located near each other. When you later query the index, LlamaIndex will embed your query into the same vector space and look for the Nodes
whose embeddings are closest to the query embedding. This process is known as similarity search.
Common embedding models include those from OpenAI, Cohere, or open-source models available through libraries like Hugging Face's transformers
or Sentence Transformers. LlamaIndex integrates with many of these, often using a default model if you don't specify one.
With Nodes
and their corresponding embeddings, LlamaIndex constructs an Index
. The index is the data structure that organizes the Nodes
and their embeddings to enable efficient querying.
Several types of indexes exist in LlamaIndex, but the most common and versatile one for similarity search is the VectorStoreIndex
. This index stores the Node
embeddings (often in a specialized database called a vector store) and allows for rapid searching to find the Nodes
most semantically similar to a given query.
Let's see how to build a basic VectorStoreIndex
. Assuming you have a list of Document
objects loaded (as covered in the previous section, let's call it documents
):
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.openai import OpenAIEmbedding # Or another embedding model
# Example using OpenAI embeddings (requires OPENAI_API_KEY to be set)
# You can replace this with other embedding models LlamaIndex supports.
Settings.embed_model = OpenAIEmbedding()
# Create the index from your loaded documents
index = VectorStoreIndex.from_documents(documents)
print("Index created successfully!")
This simple command performs several steps behind the scenes:
documents
.Nodes
based on default settings.Node
using the configured embedding model (here, OpenAIEmbedding
via Settings
).Nodes
and their embeddings in an in-memory vector store structure, creating the VectorStoreIndex
.The diagram below illustrates this flow:
Data flow from raw documents to a queryable LlamaIndex index, involving loading, splitting into nodes, embedding, and storing in an index structure.
Now that the data is indexed, it's structured for efficient retrieval. The next step is to learn how to ask questions (query) against this index to find the relevant information needed by your LLM.
Building an index, especially generating embeddings, can take time and computational resources, particularly for large datasets. You typically don't want to rebuild the index every time your application starts. LlamaIndex allows you to persist (save) your index to disk and load it back later.
from llama_index.core import StorageContext, load_index_from_storage
# Define a path where the index will be stored
PERSIST_DIR = "./storage"
# Save the index to disk
index.storage_context.persist(persist_dir=PERSIST_DIR)
print(f"Index saved to {PERSIST_DIR}")
# Load the index from disk later
# Create a default storage context pointing to the persist directory
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
# Load the index (requires the same embedding model configuration)
loaded_index = load_index_from_storage(storage_context)
print("Index loaded successfully!")
By persisting the index, you separate the potentially time-consuming indexing process from the querying phase of your application, making startup much faster. We'll cover more advanced storage options and vector stores in the RAG chapter.
© 2025 ApX Machine Learning