Production-Ready Vector Databases: Choosing Between pgvector, Pinecone, and Weaviate

11/5/2024

You've built a RAG proof-of-concept with ChromaDB or FAISS in memory. Now you need to scale to millions of documents, sub-100ms latency, and metadata filtering.

Here's how to choose between pgvector, Pinecone, Weaviate, and Qdrant for production.

The Decision Matrix

| Feature | pgvector | Pinecone | Weaviate | Qdrant | |---------|----------|----------|----------|--------| | Setup | Add extension to Postgres | Fully managed | Self-host or cloud | Self-host or cloud | | Cost (10M vectors) | ~$50/mo (RDS) | ~$400/mo | ~$200/mo | ~$150/mo | | Latency (p95) | 50-150ms | 30-80ms | 40-100ms | 35-90ms | | Hybrid search | ✅ (tsvector + vector) | ❌ (vector only) | ✅ (BM25 + vector) | ✅ (built-in) | | Metadata filtering | ✅ (SQL WHERE) | ✅ (limited) | ✅ (GraphQL) | ✅ (JSON filter) | | Best for | Small-medium scale | Serverless, hands-off | Complex schemas | High performance |

Option 1: pgvector (PostgreSQL Extension)

When to choose: You already use Postgres, or want to avoid managing another database.

Setup

-- Enable extension
CREATE EXTENSION vector;

-- Create table
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536),  -- OpenAI ada-002 dimension
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Create index (HNSW for speed)
CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops);

-- Add GIN index for metadata filtering
CREATE INDEX ON documents USING gin(metadata);

-- Full-text search index
CREATE INDEX ON documents USING gin(to_tsvector('english', content));

Query with Hybrid Search

import psycopg2
from openai import OpenAI

client = OpenAI()
conn = psycopg2.connect("postgresql://...")

def hybrid_search(query, limit=5):
    # Get query embedding
    embedding = client.embeddings.create(
        input=query,
        model="text-embedding-ada-002"
    ).data[0].embedding
    
    # Hybrid: vector similarity + full-text + metadata
    cursor = conn.cursor()
    cursor.execute("""
        WITH vector_matches AS (
            SELECT id, content, metadata,
                   1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            WHERE metadata->>'status' = 'published'
            ORDER BY embedding <=> %s::vector
            LIMIT 20
        ),
        text_matches AS (
            SELECT id, 
                   ts_rank(to_tsvector('english', content), 
                          plainto_tsquery('english', %s)) AS rank
            FROM documents
            WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
        )
        SELECT v.id, v.content, v.metadata,
               (v.similarity * 0.7 + COALESCE(t.rank, 0) * 0.3) AS score
        FROM vector_matches v
        LEFT JOIN text_matches t ON v.id = t.id
        ORDER BY score DESC
        LIMIT %s
    """, (embedding, embedding, query, query, limit))
    
    return cursor.fetchall()

Pros:

Single database for structured + vector data
ACID transactions
Familiar SQL interface
Low cost

Cons:

Slower than specialized vector DBs at massive scale (>10M vectors)
Index build times increase with size

Option 2: Pinecone (Managed Vector DB)

When to choose: You want zero ops and can pay for convenience.

Setup

import pinecone
from openai import OpenAI

# Initialize
pinecone.init(api_key="...", environment="us-west1-gcp")

# Create index
pinecone.create_index(
    name="documents",
    dimension=1536,
    metric="cosine",
    pods=1,
    pod_type="p1.x1"  # ~$70/mo
)

index = pinecone.Index("documents")

# Upsert vectors
client = OpenAI()
embedding = client.embeddings.create(
    input="Document text...",
    model="text-embedding-ada-002"
).data[0].embedding

index.upsert(vectors=[
    {
        "id": "doc-1",
        "values": embedding,
        "metadata": {
            "title": "...",
            "category": "...",
            "date": "2024-11-05"
        }
    }
])

Query with Metadata Filtering

def search(query, filters=None):
    embedding = client.embeddings.create(
        input=query,
        model="text-embedding-ada-002"
    ).data[0].embedding
    
    results = index.query(
        vector=embedding,
        top_k=5,
        filter=filters or {},  # {"category": "healthcare"}
        include_metadata=True
    )
    
    return [(r.id, r.score, r.metadata) for r in results.matches]

Pros:

Serverless, auto-scaling
Fast queries (p95 < 50ms)
Simple API
Built-in monitoring

Cons:

No hybrid search (vector only)
Limited metadata filtering
Cost scales quickly

Option 3: Weaviate (Self-Hosted or Cloud)

When to choose: You need hybrid search and complex schema relationships.

Setup with Docker

# docker-compose.yml
version: '3.4'
services:
  weaviate:
    image: semitechnologies/weaviate:1.23.0
    ports:
      - "8080:8080"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai'
    volumes:
      - weaviate_data:/var/lib/weaviate
volumes:
  weaviate_data:

Schema + Hybrid Search

import weaviate

client = weaviate.Client("http://localhost:8080")

# Define schema
schema = {
    "class": "Document",
    "vectorizer": "text2vec-openai",
    "moduleConfig": {
        "text2vec-openai": {
            "model": "ada-002",
            "type": "text"
        }
    },
    "properties": [
        {"name": "content", "dataType": ["text"]},
        {"name": "title", "dataType": ["string"]},
        {"name": "category", "dataType": ["string"]},
        {"name": "publishedDate", "dataType": ["date"]}
    ]
}

client.schema.create_class(schema)

# Insert (auto-vectorizes)
client.data_object.create(
    data_object={
        "content": "Document content...",
        "title": "RAG Systems",
        "category": "AI"
    },
    class_name="Document"
)

# Hybrid search (BM25 + vector)
results = (
    client.query
    .get("Document", ["content", "title", "category"])
    .with_hybrid(query="RAG systems", alpha=0.7)  # 0=BM25, 1=vector
    .with_where({
        "path": ["category"],
        "operator": "Equal",
        "valueString": "AI"
    })
    .with_limit(5)
    .do()
)

Pros:

True hybrid search (BM25 + vector)
GraphQL query language
Schema enforcement
Self-hosted option

Cons:

More complex setup
Requires ops knowledge for self-hosting

Option 4: Qdrant (High Performance)

When to choose: Maximum query speed and efficiency.

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

# Insert
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=embedding,
            payload={
                "content": "...",
                "category": "AI",
                "published": "2024-11-05"
            }
        )
    ]
)

# Search with filtering
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter={
        "must": [
            {"key": "category", "match": {"value": "AI"}}
        ]
    },
    limit=5
)

Pros:

Fastest queries (Rust-based)
Rich filtering
Hybrid search support
Efficient memory usage

Cons:

Smaller community vs Pinecone/Weaviate
Self-hosting required for cost savings

Decision Flowchart

Do you already use Postgres?
├─ Yes → pgvector (unless >50M vectors)
└─ No
    ├─ Want zero ops + can pay premium? → Pinecone
    ├─ Need hybrid search + GraphQL? → Weaviate
    └─ Want maximum speed + self-host? → Qdrant

Production Checklist

[ ] Load test with realistic query volume
[ ] Monitor p95/p99 latency under load
[ ] Test metadata filtering performance
[ ] Implement rate limiting and caching
[ ] Set up backup/recovery strategy
[ ] Plan for re-indexing (model upgrades)
[ ] Budget for scaling (storage + compute)

Our Recommendation

For most production RAG systems:

Start with pgvector if you use Postgres
Switch to Qdrant when you hit scale limits (>10M vectors)
Use Pinecone if ops overhead is prohibitive

Avoid: Running FAISS/ChromaDB in production. They're great for prototypes, terrible for production (no persistence, no filtering, no scalability).

Need help architecting your vector database? Reach out.