pgvector adds vector similarity search to PostgreSQL — CREATE EXTENSION vector enables it. VECTOR(1536) column type stores embeddings. embedding <=> query_vector computes cosine distance (lower = more similar). CREATE INDEX USING hnsw (embedding vector_cosine_ops) makes large-scale ANN searches fast. Generate embeddings with openai.embeddings.create({ model: "text-embedding-3-small", input }) and store the 1536-float array. Hybrid search combines ts_rank full-text with vector similarity using a weighted sum. Prisma uses $queryRaw with Prisma.sql template tags to pass vector values. Drizzle uses the drizzle-orm/pg-core vector column type. Chunk long documents into ~512-token segments for optimal retrieval. Re-rank results with a cross-encoder for precision. Claude Code generates pgvector schemas, embedding pipelines, similarity search queries, hybrid retrieval, and RAG document ingestion patterns.
CLAUDE.md for pgvector
## pgvector Stack
- Version: pgvector >= 0.7 (Postgres extension), openai >= 4.49 (embeddings)
- Column: embedding VECTOR(1536) — for text-embedding-3-small (1536 dims)
- Index: CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64)
- Query: ORDER BY embedding <=> $1::vector LIMIT 10 — cosine distance (lower = more similar)
- L2: <-> operator, inner product: <#> (negative inner product)
- Prisma: db.$queryRaw\`SELECT id, content, embedding <=> ${Prisma.sql\`${embedding}::vector\`} AS distance\`
- Hybrid: 0.7 * vector_score + 0.3 * text_rank — weighted combination
- Chunk: 512 tokens per chunk with 50-token overlap for retrieval
Database Schema
-- migrations/001_pgvector.sql — enable extension and create table
CREATE EXTENSION IF NOT EXISTS vector;
-- Documents with embeddings for semantic search
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}'::jsonb,
source_url TEXT,
source_type TEXT DEFAULT 'article',
embedding VECTOR(1536), -- text-embedding-3-small dimensions
tokens INT,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Full-text search for hybrid retrieval
CREATE INDEX documents_content_fts ON documents USING gin(to_tsvector('english', content));
-- HNSW index for approximate nearest neighbor search (fast for large datasets)
-- Tune m (connections per layer) and ef_construction (build quality)
CREATE INDEX documents_embedding_hnsw ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- For smaller datasets, IVFFlat is simpler:
-- CREATE INDEX documents_embedding_ivf ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);
-- Set search quality at query time
SET hnsw.ef_search = 40; -- Higher = more accurate, slower
Embedding Pipeline
// lib/embeddings.ts — generate and store embeddings
import OpenAI from "openai"
import { db } from "./db"
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
export async function generateEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text.replace(/\n/g, " "), // Normalize whitespace
encoding_format: "float",
})
return response.data[0].embedding
}
// Batch embedding for efficiency (up to 2048 inputs per request)
export async function generateEmbeddingsBatch(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: texts.map(t => t.replace(/\n/g, " ")),
encoding_format: "float",
})
return response.data.map(d => d.embedding)
}
// Chunk text into segments for indexing
export function chunkText(text: string, maxTokens = 512, overlapTokens = 50): string[] {
// Simple word-based approximation (1 token ≈ 0.75 words)
const maxWords = Math.floor(maxTokens * 0.75)
const overlapWords = Math.floor(overlapTokens * 0.75)
const words = text.split(/\s+/)
const chunks: string[] = []
let i = 0
while (i < words.length) {
const chunk = words.slice(i, i + maxWords).join(" ")
if (chunk.trim()) chunks.push(chunk.trim())
i += maxWords - overlapWords
}
return chunks
}
// Index a document — chunk and embed
export async function indexDocument(params: {
content: string
metadata?: Record<string, unknown>
sourceUrl?: string
sourceType?: string
}): Promise<string[]> {
const chunks = chunkText(params.content)
const embeddings = await generateEmbeddingsBatch(chunks)
const ids: string[] = []
for (let i = 0; i < chunks.length; i++) {
const result = await db.$queryRaw<{ id: string }[]>`
INSERT INTO documents (content, metadata, source_url, source_type, embedding, tokens)
VALUES (
${chunks[i]},
${JSON.stringify({ ...params.metadata, chunkIndex: i, totalChunks: chunks.length })},
${params.sourceUrl ?? null},
${params.sourceType ?? "article"},
${JSON.stringify(embeddings[i])}::vector,
${Math.round(chunks[i].split(/\s+/).length / 0.75)}
)
RETURNING id
`
ids.push(result[0].id)
}
return ids
}
Semantic Search Queries
// lib/search.ts — vector and hybrid search
import { db } from "./db"
import { generateEmbedding } from "./embeddings"
type SearchResult = {
id: string
content: string
metadata: Record<string, unknown>
sourceUrl: string | null
distance: number
}
// Pure vector similarity search
export async function vectorSearch(
query: string,
limit = 10,
minSimilarity = 0.7,
): Promise<SearchResult[]> {
const embedding = await generateEmbedding(query)
const vectorLiteral = `[${embedding.join(",")}]`
const results = await db.$queryRaw<SearchResult[]>`
SELECT
id,
content,
metadata,
source_url as "sourceUrl",
1 - (embedding <=> ${vectorLiteral}::vector) AS distance
FROM documents
WHERE 1 - (embedding <=> ${vectorLiteral}::vector) > ${minSimilarity}
ORDER BY embedding <=> ${vectorLiteral}::vector
LIMIT ${limit}
`
return results
}
// Hybrid search — combine full-text and vector similarity
export async function hybridSearch(
query: string,
limit = 10,
vectorWeight = 0.7,
): Promise<SearchResult[]> {
const embedding = await generateEmbedding(query)
const vectorLiteral = `[${embedding.join(",")}]`
const textWeight = 1 - vectorWeight
const results = await db.$queryRaw<SearchResult[]>`
WITH vector_scores AS (
SELECT
id,
1 - (embedding <=> ${vectorLiteral}::vector) AS vec_score
FROM documents
ORDER BY embedding <=> ${vectorLiteral}::vector
LIMIT ${limit * 3} -- Over-fetch for re-ranking
),
text_scores AS (
SELECT
id,
ts_rank(to_tsvector('english', content), plainto_tsquery('english', ${query})) AS text_score
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', ${query})
),
combined AS (
SELECT
d.id,
d.content,
d.metadata,
d.source_url,
COALESCE(v.vec_score, 0) * ${vectorWeight} +
COALESCE(t.text_score, 0) * ${textWeight} AS distance
FROM documents d
LEFT JOIN vector_scores v ON d.id = v.id
LEFT JOIN text_scores t ON d.id = t.id
WHERE v.id IS NOT NULL OR t.id IS NOT NULL
)
SELECT id, content, metadata, source_url AS "sourceUrl", distance
FROM combined
ORDER BY distance DESC
LIMIT ${limit}
`
return results
}
// RAG retrieval — get context for a query
export async function retrieveContext(
query: string,
options = { limit: 5, minSimilarity: 0.6 },
): Promise<string> {
const results = await vectorSearch(query, options.limit, options.minSimilarity)
if (results.length === 0) return ""
return results
.map((r, i) => `[${i + 1}] ${r.content}`)
.join("\n\n")
}
RAG API Route
// app/api/rag/route.ts — RAG chat endpoint
import { NextRequest, NextResponse } from "next/server"
import OpenAI from "openai"
import { retrieveContext } from "@/lib/search"
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
export async function POST(request: NextRequest) {
const { query } = await request.json()
// Retrieve relevant context
const context = await retrieveContext(query, { limit: 5, minSimilarity: 0.65 })
const messages: OpenAI.ChatCompletionMessageParam[] = [
{
role: "system",
content: `You are a helpful assistant. Answer the user's question using the provided context.
If the context doesn't contain relevant information, say so — do not make up answers.
Context:
${context}`,
},
{ role: "user", content: query },
]
const response = await openai.chat.completions.create({
model: "gpt-4-turbo-preview",
messages,
temperature: 0.2, // Lower temperature for factual RAG
max_tokens: 1000,
})
return NextResponse.json({
answer: response.choices[0].message.content,
sources: [], // Could extract source URLs from results
})
}
For the Pinecone alternative when a dedicated managed vector database with namespaces, metadata filtering, and purpose-built vector indexing is preferred over adding pgvector to an existing PostgreSQL database — Pinecone handles billions of vectors at scale without managing PostgreSQL, though it requires a separate service and doesn’t support relational joins, see the vector database guide. For the Supabase Vector alternative when the pgvector extension is managed through Supabase’s hosted PostgreSQL with automatic embedding via Supabase Edge Functions and the vecs Python client — Supabase provides the same pgvector capabilities with a built-in dashboard for vector exploration, see the Supabase Advanced guide. The Claude Skills 360 bundle includes pgvector skill sets covering embeddings, HNSW indexing, and RAG retrieval. Start with the free tier to try semantic search generation.