Production-Ready Vector Databases: Choosing Between pgvector, Pinecone, and Weaviate
You've built a RAG proof-of-concept with ChromaDB or FAISS in memory. Now you need to scale to millions of documents, sub-100ms latency, and metadata filtering.
Here's how to choose between pgvector, Pinecone, Weaviate, and Qdrant for production.
The Decision Matrix
| Feature | pgvector | Pinecone | Weaviate | Qdrant | |---------|----------|----------|----------|--------| | Setup | Add extension to Postgres | Fully managed | Self-host or cloud | Self-host or cloud | | Cost (10M vectors) | ~$50/mo (RDS) | ~$400/mo | ~$200/mo | ~$150/mo | | Latency (p95) | 50-150ms | 30-80ms | 40-100ms | 35-90ms | | Hybrid search | ✅ (tsvector + vector) | ❌ (vector only) | ✅ (BM25 + vector) | ✅ (built-in) | | Metadata filtering | ✅ (SQL WHERE) | ✅ (limited) | ✅ (GraphQL) | ✅ (JSON filter) | | Best for | Small-medium scale | Serverless, hands-off | Complex schemas | High performance |
Option 1: pgvector (PostgreSQL Extension)
When to choose: You already use Postgres, or want to avoid managing another database.
Setup
-- Enable extension
CREATE EXTENSION vector;
-- Create table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536), -- OpenAI ada-002 dimension
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
-- Create index (HNSW for speed)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
-- Add GIN index for metadata filtering
CREATE INDEX ON documents USING gin(metadata);
-- Full-text search index
CREATE INDEX ON documents USING gin(to_tsvector('english', content));
Query with Hybrid Search
import psycopg2
from openai import OpenAI
client = OpenAI()
conn = psycopg2.connect("postgresql://...")
def hybrid_search(query, limit=5):
# Get query embedding
embedding = client.embeddings.create(
input=query,
model="text-embedding-ada-002"
).data[0].embedding
# Hybrid: vector similarity + full-text + metadata
cursor = conn.cursor()
cursor.execute("""
WITH vector_matches AS (
SELECT id, content, metadata,
1 - (embedding <=> %s::vector) AS similarity
FROM documents
WHERE metadata->>'status' = 'published'
ORDER BY embedding <=> %s::vector
LIMIT 20
),
text_matches AS (
SELECT id,
ts_rank(to_tsvector('english', content),
plainto_tsquery('english', %s)) AS rank
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
)
SELECT v.id, v.content, v.metadata,
(v.similarity * 0.7 + COALESCE(t.rank, 0) * 0.3) AS score
FROM vector_matches v
LEFT JOIN text_matches t ON v.id = t.id
ORDER BY score DESC
LIMIT %s
""", (embedding, embedding, query, query, limit))
return cursor.fetchall()
Pros:
- Single database for structured + vector data
- ACID transactions
- Familiar SQL interface
- Low cost
Cons:
- Slower than specialized vector DBs at massive scale (>10M vectors)
- Index build times increase with size
Option 2: Pinecone (Managed Vector DB)
When to choose: You want zero ops and can pay for convenience.
Setup
import pinecone
from openai import OpenAI
# Initialize
pinecone.init(api_key="...", environment="us-west1-gcp")
# Create index
pinecone.create_index(
name="documents",
dimension=1536,
metric="cosine",
pods=1,
pod_type="p1.x1" # ~$70/mo
)
index = pinecone.Index("documents")
# Upsert vectors
client = OpenAI()
embedding = client.embeddings.create(
input="Document text...",
model="text-embedding-ada-002"
).data[0].embedding
index.upsert(vectors=[
{
"id": "doc-1",
"values": embedding,
"metadata": {
"title": "...",
"category": "...",
"date": "2024-11-05"
}
}
])
Query with Metadata Filtering
def search(query, filters=None):
embedding = client.embeddings.create(
input=query,
model="text-embedding-ada-002"
).data[0].embedding
results = index.query(
vector=embedding,
top_k=5,
filter=filters or {}, # {"category": "healthcare"}
include_metadata=True
)
return [(r.id, r.score, r.metadata) for r in results.matches]
Pros:
- Serverless, auto-scaling
- Fast queries (p95 < 50ms)
- Simple API
- Built-in monitoring
Cons:
- No hybrid search (vector only)
- Limited metadata filtering
- Cost scales quickly
Option 3: Weaviate (Self-Hosted or Cloud)
When to choose: You need hybrid search and complex schema relationships.
Setup with Docker
# docker-compose.yml
version: '3.4'
services:
weaviate:
image: semitechnologies/weaviate:1.23.0
ports:
- "8080:8080"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
ENABLE_MODULES: 'text2vec-openai'
volumes:
- weaviate_data:/var/lib/weaviate
volumes:
weaviate_data:
Schema + Hybrid Search
import weaviate
client = weaviate.Client("http://localhost:8080")
# Define schema
schema = {
"class": "Document",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada-002",
"type": "text"
}
},
"properties": [
{"name": "content", "dataType": ["text"]},
{"name": "title", "dataType": ["string"]},
{"name": "category", "dataType": ["string"]},
{"name": "publishedDate", "dataType": ["date"]}
]
}
client.schema.create_class(schema)
# Insert (auto-vectorizes)
client.data_object.create(
data_object={
"content": "Document content...",
"title": "RAG Systems",
"category": "AI"
},
class_name="Document"
)
# Hybrid search (BM25 + vector)
results = (
client.query
.get("Document", ["content", "title", "category"])
.with_hybrid(query="RAG systems", alpha=0.7) # 0=BM25, 1=vector
.with_where({
"path": ["category"],
"operator": "Equal",
"valueString": "AI"
})
.with_limit(5)
.do()
)
Pros:
- True hybrid search (BM25 + vector)
- GraphQL query language
- Schema enforcement
- Self-hosted option
Cons:
- More complex setup
- Requires ops knowledge for self-hosting
Option 4: Qdrant (High Performance)
When to choose: Maximum query speed and efficiency.
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(url="http://localhost:6333")
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
# Insert
client.upsert(
collection_name="documents",
points=[
PointStruct(
id=1,
vector=embedding,
payload={
"content": "...",
"category": "AI",
"published": "2024-11-05"
}
)
]
)
# Search with filtering
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter={
"must": [
{"key": "category", "match": {"value": "AI"}}
]
},
limit=5
)
Pros:
- Fastest queries (Rust-based)
- Rich filtering
- Hybrid search support
- Efficient memory usage
Cons:
- Smaller community vs Pinecone/Weaviate
- Self-hosting required for cost savings
Decision Flowchart
Do you already use Postgres?
├─ Yes → pgvector (unless >50M vectors)
└─ No
├─ Want zero ops + can pay premium? → Pinecone
├─ Need hybrid search + GraphQL? → Weaviate
└─ Want maximum speed + self-host? → Qdrant
Production Checklist
- [ ] Load test with realistic query volume
- [ ] Monitor p95/p99 latency under load
- [ ] Test metadata filtering performance
- [ ] Implement rate limiting and caching
- [ ] Set up backup/recovery strategy
- [ ] Plan for re-indexing (model upgrades)
- [ ] Budget for scaling (storage + compute)
Our Recommendation
For most production RAG systems:
- Start with pgvector if you use Postgres
- Switch to Qdrant when you hit scale limits (>10M vectors)
- Use Pinecone if ops overhead is prohibitive
Avoid: Running FAISS/ChromaDB in production. They're great for prototypes, terrible for production (no persistence, no filtering, no scalability).
Need help architecting your vector database? Reach out.