Having established the theoretical groundwork for vector embeddings and their role in information retrieval, let's transition to practical application. In this section, you'll gain hands-on experience generating these numerical representations of text using a popular Python library. This is a fundamental step in preparing your data for the retrieval component of any RAG system.
Numerous libraries and pre-trained models are available for generating text embeddings. For this exercise, we'll use the sentence-transformers
library, which provides an easy-to-use interface for many state-of-the-art embedding models. It's built on top of the widely adopted Hugging Face transformers
library.
We will use the all-MiniLM-L6-v2
model. This is a well-regarded sentence-embedding model known for its balance between computational efficiency and performance on semantic similarity tasks. It's a great starting point for many RAG applications.
First, ensure you have the sentence-transformers
library installed. If you haven't already installed it, you can do so using pip:
pip install sentence-transformers
This command downloads and installs the library along with its dependencies, including PyTorch or TensorFlow (depending on your setup) and the transformers
library.
Let's start with the basics: generating an embedding for a single piece of text.
from sentence_transformers import SentenceTransformer
# Load the pre-trained model
# The model will be downloaded automatically if it's not cached
model = SentenceTransformer('all-MiniLM-L6-v2')
# Define a sample sentence
sentence = "This is an example sentence demonstrating embedding generation."
# Generate the embedding
embedding = model.encode(sentence)
# Print the embedding's shape and the first few dimensions
print(f"Sentence: {sentence}")
print(f"Embedding shape: {embedding.shape}")
print(f"Embedding (first 5 dimensions): {embedding[:5]}")
When you run this code, the SentenceTransformer
library will first download the all-MiniLM-L6-v2
model files if they aren't already present on your system. The model.encode()
method takes the input text and processes it through the model's layers to produce a numerical vector, the embedding.
The output will look something like this (the exact floating-point values may vary slightly):
Sentence: This is an example sentence demonstrating embedding generation.
Embedding shape: (384,)
Embedding (first 5 dimensions): [ 0.05483127 0.05900988 -0.00499174 0.07899283 -0.0135861 ]
Notice the shape (384,)
. This indicates that the all-MiniLM-L6-v2
model produces a 384-dimensional vector for the input sentence. Each of these 384 numbers captures some aspect of the sentence's semantic meaning, learned by the model during its training.
Generating embeddings one by one is inefficient if you have many documents or text chunks. The encode
method is optimized to handle lists of sentences (or text snippets) in batches, leveraging parallel computation capabilities.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Define a list of sentences
sentences = [
"The retriever finds relevant documents.",
"Vector databases store text embeddings efficiently.",
"Large language models generate human-like text.",
"RAG combines retrieval with generation."
]
# Generate embeddings for the list of sentences
embeddings = model.encode(sentences)
# Print the shape of the resulting embeddings array
print(f"Number of sentences: {len(sentences)}")
print(f"Shape of embeddings array: {embeddings.shape}")
# You can access individual embeddings like this:
# print(f"Embedding for the first sentence (first 5 dims): {embeddings[0][:5]}")
The output will show the shape of the resulting NumPy array:
Number of sentences: 4
Shape of embeddings array: (4, 384)
This output (4, 384)
confirms that we have generated embeddings for all 4 input sentences, and each embedding is a 384-dimensional vector. This batch processing approach is significantly faster for larger datasets than encoding sentences individually in a loop.
You now have a practical understanding of how to convert text into dense vector representations. Each vector, like the ones generated above, resides in a high-dimensional space (384 dimensions in our case). The key property, learned by the model during training, is that texts with similar meanings will have vectors that are "close" to each other in this space, typically measured using metrics like cosine similarity (cos(θ)).
It's hard to visualize 384 dimensions directly, but imagine a simpler 2D space. Sentences with similar meanings would cluster together.
Illustration of how sentences with similar meanings ('cat sat on mat', 'feline on rug') might cluster closer together in a simplified 2D embedding space compared to unrelated sentences ('nice weather', 'stock market').
These generated embeddings are the foundational elements you'll store in a vector database. In the upcoming sections and chapters, you'll learn how to populate a vector database with these embeddings and perform similarity searches to find the most relevant text chunks for a given query, forming the core of the RAG system's retrieval mechanism.
© 2025 ApX Machine Learning