Plan RAG pipeline infrastructure. Estimate memory, storage, and throughput for embedding model and vector index.
Quick-start scenarios
Embedding Model
Precision
Corpus / Data
Index / Search
Graph Connections (M)
Connections per node
Search Quality (ef_search)
Directly affects query latency and QPS
Infrastructure
Replicas
OS/Buffer Overhead (%)
RAM usage
768.1 MB / 16 GB
4.7% of available RAM
Fits in-memory: lowest-latency search viable
Raw Vectors
585.9 MB
76.3%
Index Overhead
62.9 MB
8.2%
Metadata
19.1 MB
2.5%
Buffer (15%)
100.2 MB
13.0%
Reducing precision cuts memory requirements dramatically. Modern vector DBs support rescoring float32 results from binary/int8 candidates, recovering most recall.
| Precision | Bytes/dim | Memory vs float32 | Recall vs float32 | Notes |
|---|---|---|---|---|
| float32 | 4B | 100% | Baseline (100%) | Maximum accuracy |
| float16 | 2B | 50% | ~99.9% | Negligible loss, recommended default |
| int8 / scalar | 1B | 25% | ~99% | Great for large corpora |
| binary | 0.125B | 3% | 70-90% | Extreme compression, use with re-ranking |
Estimates follow standard vector database formulas: HNSW link overhead, IVF centroid memory, PQ codebook sizes, and metadata budgets. A configurable buffer % (default 15%) is applied for OS and framework overhead. QPS ranges are indicative and highly hardware-dependent.
APX AI
Online