Vector Databases: Infrastructure for Semantic Search

Vector databases are purpose-built for one operation that general-purpose databases handle poorly: finding the k most similar vectors to a query vector. Understanding why this requires specialized infrastructure helps you make better decisions about which database to use and when.

Why SQL Fails for Vector Search

A B-tree index, which underlies most SQL indexes, is designed for equality lookups and range queries. It answers "find all rows where age is between 25 and 35" efficiently because ages exist on a one-dimensional ordered line.

Vectors have no natural ordering in high-dimensional space. You cannot sort 1536-dimensional vectors in a way that lets you quickly prune the search space. Every similarity query would require a full table scan — computing the distance from the query to every stored vector.

The pgvector extension for PostgreSQL supports IVF-based indexing and HNSW-based indexing, which makes it viable for production workloads under about one million vectors. Beyond that, dedicated vector databases with better indexing and sharding are necessary.

HNSW: The Algorithm Behind Most Vector Databases

HNSW was proposed in 2016 and has become the standard for approximate nearest neighbor search in high-dimensional spaces.

Construction:

Insert each vector one at a time
Each vector is assigned a maximum layer level based on an exponential probability distribution (most vectors stay at the bottom layer)
Connect the new vector to its M nearest neighbors at each layer it occupies
The resulting graph has the small-world property: any two nodes can be reached in a small number of hops

Search:

Begin at the entry point in the topmost layer
Greedily navigate to the closest node
Descend to the next layer and repeat
At the bottom layer, expand the search to collect the top-k candidates

Key parameters: M (number of neighbors per node, higher = better recall, more memory), ef_construction (search width during index build, higher = better index quality, slower build), ef_search (search width at query time, higher = better recall, slower query). These are tunable based on your recall vs latency tradeoff.

IVF: The Alternative Indexing Strategy

IVF (Inverted File Index) works differently. It k-means clusters the vector space into nlist centroids at build time. To search, it identifies the nprobe most relevant clusters and searches only within those.

IVF advantages: faster to build than HNSW, lower memory usage. Disadvantages: slightly lower recall, worse on clustered data distributions. Common choice when memory is constrained or build time matters.

Vector Database Comparison

Chroma: Architecture: SQLite-backed, runs in-process. Setup: pip install chromadb, no server. Scale: suitable for development and small production (< 500K vectors). Strengths: zero configuration. Weaknesses: limited filtering, no horizontal scaling.

Pinecone: Architecture: Fully managed cloud. Setup: API key only. Scale: handles tens of millions of vectors. Strengths: zero ops, simple API, good documentation. Weaknesses: expensive at scale, no self-host option, vendor lock-in.

Weaviate: Architecture: Open source, Go-based, Docker or managed cloud. Setup: moderate complexity. Scale: production-grade. Strengths: hybrid search built-in, rich schema support, multiple vectorizer modules. Weaknesses: more complex configuration.

Qdrant: Architecture: Open source, Rust-based, Docker or managed cloud. Setup: moderate complexity. Scale: production-grade. Strengths: very fast, excellent payload filtering, good API design. Weaknesses: newer, smaller community than Weaviate.

pgvector: Architecture: PostgreSQL extension. Setup: install extension, create vector column. Scale: acceptable to ~1M vectors with HNSW index. Strengths: one database for everything, SQL familiarity, ACID transactions. Weaknesses: slower than dedicated vector DBs at scale, limited approximate search tuning.

Decision Framework

Under 200K vectors, team already uses PostgreSQL: pgvector. Development and prototyping: Chroma. Production, team wants no infrastructure: Pinecone. Production, team can manage infrastructure, needs hybrid search: Weaviate. Production, team can manage infrastructure, needs fast filtering: Qdrant. Over 50M vectors: custom sharding strategy with any of the above.

Metadata Filtering Strategies

Pre-filtering: Apply metadata filter to reduce the search space before vector search. Fast when the filter is highly selective (returns < 10% of vectors). Risk: if the filter returns very few vectors, recall drops.

Post-filtering: Run vector search without filter, then apply metadata filter to results. Accurate but requires retrieving more candidates than you need.

Filtered HNSW: Some databases (Qdrant, Weaviate) integrate filtering into the graph traversal itself. This is generally the best approach when available.

Always store metadata that you will filter on when you index your documents. Adding metadata later requires re-indexing.

Vector Databases: Infrastructure for Semantic Search

Comments

AI Fundamentals

Chunking: Why Document Splitting Determines RAG Quality

More from this blog

RAG: Retrieval Augmented Generation

Chunking: Why Document Splitting Determines RAG Quality

Semantic Search: How Meaning-Based Retrieval Works

Vectors and Embeddings: The Mathematical Foundation of AI Systems

Command Palette

Comments

AI Fundamentals

Chunking: Why Document Splitting Determines RAG Quality

More from this blog