pgvector adds vector similarity search to PostgreSQL — CREATE EXTENSION vector enables it. VECTOR(1536) column type stores embeddings. embedding <=> query_vector computes cosine distance (lower = more similar). CREATE INDEX USING hnsw (embedding vector_cosine_ops) makes large-scale ANN searches fast. Generate embeddings with openai.embeddings.create({ model: "text-embedding-3-small", input }) and store the 1536-float array. Hybrid search combines ts_rank full-text with vector similarity using a weighted sum. Prisma uses $queryRaw with Prisma.sql template tags to pass vector values. Drizzle uses the drizzle-orm/pg-core vector column type. Chunk long documents into ~512-token segments for optimal retrieval. Re-rank results with a cross-encoder for precision. Claude Code generates pgvector schemas, embedding pipelines, similarity search queries, hybrid retrieval, and RAG document ingestion patterns.

CLAUDE.md for pgvector

## pgvector Stack
- Version: pgvector >= 0.7 (Postgres extension), openai >= 4.49 (embeddings)
- Column: embedding VECTOR(1536) — for text-embedding-3-small (1536 dims)
- Index: CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64)
- Query: ORDER BY embedding <=> $1::vector LIMIT 10 — cosine distance (lower = more similar)
- L2: <-> operator, inner product: <#> (negative inner product)
- Prisma: db.$queryRaw\`SELECT id, content, embedding <=> ${Prisma.sql\`${embedding}::vector\`} AS distance\`
- Hybrid: 0.7 * vector_score + 0.3 * text_rank — weighted combination
- Chunk: 512 tokens per chunk with 50-token overlap for retrieval

Database Schema

-- migrations/001_pgvector.sql — enable extension and create table
CREATE EXTENSION IF NOT EXISTS vector;

-- Documents with embeddings for semantic search
CREATE TABLE documents (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  content     TEXT NOT NULL,
  metadata    JSONB DEFAULT '{}'::jsonb,
  source_url  TEXT,
  source_type TEXT DEFAULT 'article',
  embedding   VECTOR(1536),  -- text-embedding-3-small dimensions
  tokens      INT,
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  updated_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Full-text search for hybrid retrieval
CREATE INDEX documents_content_fts ON documents USING gin(to_tsvector('english', content));

-- HNSW index for approximate nearest neighbor search (fast for large datasets)
-- Tune m (connections per layer) and ef_construction (build quality)
CREATE INDEX documents_embedding_hnsw ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- For smaller datasets, IVFFlat is simpler:
-- CREATE INDEX documents_embedding_ivf ON documents
-- USING ivfflat (embedding vector_cosine_ops)
-- WITH (lists = 100);

-- Set search quality at query time
SET hnsw.ef_search = 40;  -- Higher = more accurate, slower

Embedding Pipeline

// lib/embeddings.ts — generate and store embeddings
import OpenAI from "openai"
import { db } from "./db"

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })

export async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text.replace(/\n/g, " "),  // Normalize whitespace
    encoding_format: "float",
  })
  return response.data[0].embedding
}

// Batch embedding for efficiency (up to 2048 inputs per request)
export async function generateEmbeddingsBatch(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: texts.map(t => t.replace(/\n/g, " ")),
    encoding_format: "float",
  })
  return response.data.map(d => d.embedding)
}

// Chunk text into segments for indexing
export function chunkText(text: string, maxTokens = 512, overlapTokens = 50): string[] {
  // Simple word-based approximation (1 token ≈ 0.75 words)
  const maxWords = Math.floor(maxTokens * 0.75)
  const overlapWords = Math.floor(overlapTokens * 0.75)
  const words = text.split(/\s+/)
  const chunks: string[] = []

  let i = 0
  while (i < words.length) {
    const chunk = words.slice(i, i + maxWords).join(" ")
    if (chunk.trim()) chunks.push(chunk.trim())
    i += maxWords - overlapWords
  }

  return chunks
}

// Index a document — chunk and embed
export async function indexDocument(params: {
  content: string
  metadata?: Record<string, unknown>
  sourceUrl?: string
  sourceType?: string
}): Promise<string[]> {
  const chunks = chunkText(params.content)
  const embeddings = await generateEmbeddingsBatch(chunks)

  const ids: string[] = []
  for (let i = 0; i < chunks.length; i++) {
    const result = await db.$queryRaw<{ id: string }[]>`
      INSERT INTO documents (content, metadata, source_url, source_type, embedding, tokens)
      VALUES (
        ${chunks[i]},
        ${JSON.stringify({ ...params.metadata, chunkIndex: i, totalChunks: chunks.length })},
        ${params.sourceUrl ?? null},
        ${params.sourceType ?? "article"},
        ${JSON.stringify(embeddings[i])}::vector,
        ${Math.round(chunks[i].split(/\s+/).length / 0.75)}
      )
      RETURNING id
    `
    ids.push(result[0].id)
  }

  return ids
}

Semantic Search Queries

// lib/search.ts — vector and hybrid search
import { db } from "./db"
import { generateEmbedding } from "./embeddings"

type SearchResult = {
  id: string
  content: string
  metadata: Record<string, unknown>
  sourceUrl: string | null
  distance: number
}

// Pure vector similarity search
export async function vectorSearch(
  query: string,
  limit = 10,
  minSimilarity = 0.7,
): Promise<SearchResult[]> {
  const embedding = await generateEmbedding(query)
  const vectorLiteral = `[${embedding.join(",")}]`

  const results = await db.$queryRaw<SearchResult[]>`
    SELECT
      id,
      content,
      metadata,
      source_url as "sourceUrl",
      1 - (embedding <=> ${vectorLiteral}::vector) AS distance
    FROM documents
    WHERE 1 - (embedding <=> ${vectorLiteral}::vector) > ${minSimilarity}
    ORDER BY embedding <=> ${vectorLiteral}::vector
    LIMIT ${limit}
  `

  return results
}

// Hybrid search — combine full-text and vector similarity
export async function hybridSearch(
  query: string,
  limit = 10,
  vectorWeight = 0.7,
): Promise<SearchResult[]> {
  const embedding = await generateEmbedding(query)
  const vectorLiteral = `[${embedding.join(",")}]`
  const textWeight = 1 - vectorWeight

  const results = await db.$queryRaw<SearchResult[]>`
    WITH vector_scores AS (
      SELECT
        id,
        1 - (embedding <=> ${vectorLiteral}::vector) AS vec_score
      FROM documents
      ORDER BY embedding <=> ${vectorLiteral}::vector
      LIMIT ${limit * 3}  -- Over-fetch for re-ranking
    ),
    text_scores AS (
      SELECT
        id,
        ts_rank(to_tsvector('english', content), plainto_tsquery('english', ${query})) AS text_score
      FROM documents
      WHERE to_tsvector('english', content) @@ plainto_tsquery('english', ${query})
    ),
    combined AS (
      SELECT
        d.id,
        d.content,
        d.metadata,
        d.source_url,
        COALESCE(v.vec_score, 0) * ${vectorWeight} +
        COALESCE(t.text_score, 0) * ${textWeight} AS distance
      FROM documents d
      LEFT JOIN vector_scores v ON d.id = v.id
      LEFT JOIN text_scores t ON d.id = t.id
      WHERE v.id IS NOT NULL OR t.id IS NOT NULL
    )
    SELECT id, content, metadata, source_url AS "sourceUrl", distance
    FROM combined
    ORDER BY distance DESC
    LIMIT ${limit}
  `

  return results
}

// RAG retrieval — get context for a query
export async function retrieveContext(
  query: string,
  options = { limit: 5, minSimilarity: 0.6 },
): Promise<string> {
  const results = await vectorSearch(query, options.limit, options.minSimilarity)

  if (results.length === 0) return ""

  return results
    .map((r, i) => `[${i + 1}] ${r.content}`)
    .join("\n\n")
}

RAG API Route

// app/api/rag/route.ts — RAG chat endpoint
import { NextRequest, NextResponse } from "next/server"
import OpenAI from "openai"
import { retrieveContext } from "@/lib/search"

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })

export async function POST(request: NextRequest) {
  const { query } = await request.json()

  // Retrieve relevant context
  const context = await retrieveContext(query, { limit: 5, minSimilarity: 0.65 })

  const messages: OpenAI.ChatCompletionMessageParam[] = [
    {
      role: "system",
      content: `You are a helpful assistant. Answer the user's question using the provided context.
If the context doesn't contain relevant information, say so — do not make up answers.

Context:
${context}`,
    },
    { role: "user", content: query },
  ]

  const response = await openai.chat.completions.create({
    model: "gpt-4-turbo-preview",
    messages,
    temperature: 0.2,  // Lower temperature for factual RAG
    max_tokens: 1000,
  })

  return NextResponse.json({
    answer: response.choices[0].message.content,
    sources: [], // Could extract source URLs from results
  })
}

For the Pinecone alternative when a dedicated managed vector database with namespaces, metadata filtering, and purpose-built vector indexing is preferred over adding pgvector to an existing PostgreSQL database — Pinecone handles billions of vectors at scale without managing PostgreSQL, though it requires a separate service and doesn’t support relational joins, see the vector database guide. For the Supabase Vector alternative when the pgvector extension is managed through Supabase’s hosted PostgreSQL with automatic embedding via Supabase Edge Functions and the vecs Python client — Supabase provides the same pgvector capabilities with a built-in dashboard for vector exploration, see the Supabase Advanced guide. The Claude Skills 360 bundle includes pgvector skill sets covering embeddings, HNSW indexing, and RAG retrieval. Start with the free tier to try semantic search generation.

Claude Code for pgvector: Semantic Search with PostgreSQL

CLAUDE.md for pgvector

Database Schema

Embedding Pipeline

Semantic Search Queries

RAG API Route

Keep Reading

Claude Code for Bun: Fast JavaScript Runtime and Toolkit

Claude Code for Express.js Advanced: Patterns for Production APIs

Claude Code for KeystoneJS: Node.js CMS and App Framework

Put these ideas into practice