FAISS enables billion-scale nearest neighbor search on dense vectors. pip install faiss-cpu or faiss-gpu. import faiss. Exact L2: index = faiss.IndexFlatL2(dim) — brute-force, always returns exact results. Exact cosine: faiss.normalize_L2(vectors), then faiss.IndexFlatIP(dim). Add: index.add(np.array(vectors, dtype=np.float32)). Search: distances, indices = index.search(query_vectors, k=10) — returns (N, k) arrays, -1 for not-found. IVF approximate (faster): quantizer = faiss.IndexFlatL2(dim), index = faiss.IndexIVFFlat(quantizer, dim, nlist=100), index.train(training_vectors), index.nprobe = 10. IVF-PQ (compressed): faiss.IndexIVFPQ(quantizer, dim, nlist=100, m=8, nbits=8) — m sub-quantizer groups. HNSW (graph-based): faiss.IndexHNSWFlat(dim, M=32) — M connections per node, no training needed, best recall/latency tradeoff. Custom IDs: index = faiss.IndexIDMap(faiss.IndexFlatL2(dim)), index.add_with_ids(vectors, ids_array). Save: faiss.write_index(index, "index.faiss"), load: faiss.read_index("index.faiss"). GPU: res = faiss.StandardGpuResources(), gpu_index = faiss.index_cpu_to_gpu(res, 0, cpu_index). Factory: faiss.index_factory(dim, "IVF100,Flat") or "IVF256,PQ16" or "HNSW32". RangeSearch: lims, distances, indices = index.range_search(queries, radius=0.5). Claude Code generates FAISS vector indexes, semantic search backends, RAG retrieval layers, and billion-scale embedding stores.
CLAUDE.md for FAISS
## FAISS Stack
- Version: faiss-cpu >= 1.8 | faiss-gpu for CUDA acceleration
- Exact: IndexFlatL2(dim) | IndexFlatIP(dim) [normalize first for cosine]
- Fast: IndexHNSWFlat(dim, M=32) — no training, best recall/speed
- Large: IndexIVFFlat(quantizer, dim, nlist) → train() → add() → search()
- Compressed: IndexIVFPQ(quantizer, dim, nlist, m=8, nbits=8)
- IDs: IndexIDMap(base_index) → add_with_ids(vecs, int64_ids)
- Persist: write_index(index, path) | read_index(path)
- GPU: index_cpu_to_gpu(StandardGpuResources(), gpu_id, cpu_index)
FAISS Vector Search Pipeline
# nlp/faiss_pipeline.py — billion-scale vector similarity search with FAISS
from __future__ import annotations
import os
import time
import numpy as np
from pathlib import Path
from typing import Optional
import faiss
# ── 1. Index creation ────────────────────────────────────────────────────────
def create_flat_index(
dim: int,
metric: str = "l2", # "l2" | "ip" (inner product = cosine if normalized)
) -> faiss.Index:
"""
Exact brute-force index. No training required.
- Use "l2" for Euclidean distance
- Use "ip" + normalize_L2 for cosine similarity
Fast for < 1M vectors; use approximate indexes for larger sets.
"""
if metric == "ip":
return faiss.IndexFlatIP(dim)
return faiss.IndexFlatL2(dim)
def create_hnsw_index(
dim: int,
M: int = 32, # Connections per node (16-64 typical)
ef_construction: int = 200, # Build quality (higher = slower build, better recall)
ef_search: int = 64, # Search quality (set at query time)
) -> faiss.Index:
"""
HNSW graph-based ANN index.
- Best recall/latency tradeoff, no training
- M=32 is a good default (increase for higher recall)
- Memory: ~(M * 4 * dim * n_vectors) bytes
"""
index = faiss.IndexHNSWFlat(dim, M)
index.hnsw.efConstruction = ef_construction
index.hnsw.efSearch = ef_search
return index
def create_ivf_flat_index(
dim: int,
nlist: int = 100, # Number of Voronoi cells (sqrt(N) is rule of thumb)
metric: str = "l2",
) -> faiss.Index:
"""
IVF (Inverted File) + Flat storage. Requires training.
Good balance of speed and memory when N > 1M.
nlist ~ sqrt(N): e.g., 100 for 10K, 1000 for 1M, 4096 for 10M vectors.
"""
quantizer = faiss.IndexFlatL2(dim) if metric == "l2" else faiss.IndexFlatIP(dim)
index = faiss.IndexIVFFlat(quantizer, dim, nlist,
faiss.METRIC_L2 if metric=="l2" else faiss.METRIC_INNER_PRODUCT)
return index
def create_ivf_pq_index(
dim: int,
nlist: int = 256,
m: int = 8, # Sub-quantizer groups (divides dim evenly)
nbits: int = 8, # Bits per sub-quantizer (8 = 256 centroids)
) -> faiss.Index:
"""
IVF + Product Quantization. Maximum compression, ~32x smaller than Flat.
m must divide dim evenly (e.g., dim=768 → m=8, 12, 16, or 24).
Memory: n_vectors * m * (nbits/8) bytes.
"""
quantizer = faiss.IndexFlatL2(dim)
index = faiss.IndexIVFPQ(quantizer, dim, nlist, m, nbits)
return index
def create_index_with_ids(base_index: faiss.Index) -> faiss.IndexIDMap:
"""Wrap any index to support custom int64 IDs instead of sequential 0..N-1."""
return faiss.IndexIDMap(base_index)
# ── 2. Index building ─────────────────────────────────────────────────────────
def build_index(
index: faiss.Index,
vectors: np.ndarray, # (N, D) float32
ids: np.ndarray = None, # (N,) int64 — for IDMap indexes
normalize: bool = False,
train_size: int = 50_000,
) -> None:
"""
Train (if needed) and add vectors to the index.
normalize=True converts to cosine similarity via IP.
"""
vecs = np.ascontiguousarray(vectors.astype(np.float32))
if normalize:
faiss.normalize_L2(vecs)
# Train if index requires it
if hasattr(index, "is_trained") and not index.is_trained:
train_vecs = vecs[:train_size] if len(vecs) > train_size else vecs
print(f"Training on {len(train_vecs)} vectors...")
index.train(train_vecs)
print("Training complete")
# Add vectors
if ids is not None and isinstance(index, faiss.IndexIDMap):
index.add_with_ids(vecs, np.array(ids, dtype=np.int64))
else:
index.add(vecs)
print(f"Index built: {index.ntotal} vectors, dim={index.d}")
# ── 3. Search ─────────────────────────────────────────────────────────────────
def search(
index: faiss.Index,
queries: np.ndarray, # (N, D) float32
k: int = 10,
normalize: bool = False,
nprobe: int = None, # IVF search quality (higher = slower, better recall)
) -> tuple[np.ndarray, np.ndarray]:
"""
Search for k nearest neighbors.
Returns (distances, indices) each shape (N, k).
indices == -1 means not found (padding).
"""
queries = np.ascontiguousarray(queries.astype(np.float32))
if normalize:
faiss.normalize_L2(queries)
if nprobe is not None and hasattr(index, "nprobe"):
index.nprobe = nprobe
distances, indices = index.search(queries, k)
return distances, indices
def search_range(
index: faiss.Index,
query: np.ndarray, # (D,) or (1, D)
radius: float,
normalize: bool = False,
) -> list[tuple[int, float]]:
"""
Range search: return all vectors within a given distance/similarity radius.
Returns list of (index_id, distance) sorted by distance.
"""
query = np.ascontiguousarray(query.reshape(1, -1).astype(np.float32))
if normalize:
faiss.normalize_L2(query)
lims, distances, indices = index.range_search(query, radius)
results = list(zip(indices.tolist(), distances.tolist()))
return sorted(results, key=lambda x: x[1])
# ── 4. GPU acceleration ───────────────────────────────────────────────────────
def to_gpu(cpu_index: faiss.Index, gpu_id: int = 0) -> faiss.Index:
"""Move a FAISS index to GPU for faster search."""
res = faiss.StandardGpuResources()
return faiss.index_cpu_to_gpu(res, gpu_id, cpu_index)
def to_cpu(gpu_index: faiss.Index) -> faiss.Index:
"""Move a GPU index back to CPU for saving."""
return faiss.index_gpu_to_cpu(gpu_index)
# ── 5. Persistence ────────────────────────────────────────────────────────────
def save_index(index: faiss.Index, path: str) -> None:
"""Save index to disk."""
faiss.write_index(index, str(path))
size_mb = os.path.getsize(path) / 1024 / 1024
print(f"Saved: {path} ({size_mb:.1f} MB)")
def load_index(path: str) -> faiss.Index:
"""Load index from disk."""
index = faiss.read_index(str(path))
print(f"Loaded: {index.ntotal} vectors, dim={index.d}")
return index
# ── 6. Semantic document store ────────────────────────────────────────────────
class FAISSDocumentStore:
"""
A document store backed by FAISS for semantic retrieval.
Stores text documents with their embeddings.
"""
def __init__(
self,
dim: int,
index_type: str = "hnsw", # "flat" | "hnsw" | "ivf" | "ivfpq"
nlist: int = 256,
normalize: bool = True,
):
self.dim = dim
self.normalize = normalize
self.documents: list[str] = []
self.metadata: list[dict] = []
if index_type == "flat":
base = create_flat_index(dim, "ip" if normalize else "l2")
elif index_type == "hnsw":
base = create_hnsw_index(dim)
elif index_type == "ivf":
base = create_ivf_flat_index(dim, nlist)
elif index_type == "ivfpq":
m = max(1, dim // 96) # ~96D per sub-quantizer
base = create_ivf_pq_index(dim, nlist, m=m)
else:
raise ValueError(f"Unknown index_type: {index_type}")
self.index = create_index_with_ids(base)
def add(
self,
texts: list[str],
embeddings: np.ndarray,
metadata: list[dict] = None,
train: bool = True,
) -> None:
"""Add documents with their embeddings."""
start_id = len(self.documents)
ids = np.arange(start_id, start_id + len(texts), dtype=np.int64)
self.documents.extend(texts)
self.metadata.extend(metadata or [{} for _ in texts])
build_index(self.index.index, embeddings,
normalize=self.normalize, train_size=max(len(texts), 50_000))
# Rebuild IDMap with new vectors
vecs = np.ascontiguousarray(embeddings.astype(np.float32))
if self.normalize:
faiss.normalize_L2(vecs)
if not self.index.index.is_trained:
self.index.index.train(vecs)
self.index.add_with_ids(vecs, ids)
def query(
self,
embedding: np.ndarray,
k: int = 10,
) -> list[dict]:
"""Return top-k documents by semantic similarity."""
distances, indices = search(
self.index, embedding.reshape(1, -1),
k=k, normalize=self.normalize
)
results = []
for dist, idx in zip(distances[0], indices[0]):
if idx == -1:
continue
results.append({
"text": self.documents[idx],
"metadata": self.metadata[idx],
"score": round(float(dist), 4),
"id": int(idx),
})
return results
# ── 7. Benchmarking ───────────────────────────────────────────────────────────
def benchmark_index(
n_vectors: int = 100_000,
dim: int = 384,
n_queries: int = 1_000,
k: int = 10,
) -> None:
"""Compare latency and recall of different FAISS index types."""
print(f"Benchmark: {n_vectors} vectors, dim={dim}")
rng = np.random.default_rng(42)
vectors = rng.random((n_vectors, dim), dtype=np.float32)
queries = rng.random((n_queries, dim), dtype=np.float32)
faiss.normalize_L2(vectors)
faiss.normalize_L2(queries)
# Ground truth from flat index
flat = create_flat_index(dim, "ip")
flat.add(vectors)
_, gt_indices = flat.search(queries, k)
configs = [
("Flat (exact)", flat, None),
("HNSW-32", create_hnsw_index(dim, M=32), None),
("IVF-100,Flat", create_ivf_flat_index(dim, 100, "ip"), 10),
("IVF-100,PQ8", create_ivf_pq_index(dim, 100, m=8), 10),
]
for name, index, nprobe in configs:
if index is flat:
t0 = time.perf_counter()
_, result_ids = index.search(queries, k)
else:
vecs = vectors.copy()
if hasattr(index, "is_trained") and not index.is_trained:
index.train(vecs)
index.add(vecs)
t0 = time.perf_counter()
result_ids = search(index, queries, k=k, nprobe=nprobe)[1]
elapsed_ms = (time.perf_counter() - t0) * 1000
# Recall@k
recall = np.mean([
len(set(result_ids[i].tolist()) & set(gt_indices[i].tolist())) / k
for i in range(n_queries)
])
size_mb = index.ntotal * dim * 4 / 1024 / 1024
print(f" {name:<25} latency={elapsed_ms/n_queries:.2f}ms/q recall={recall:.3f} ~{size_mb:.0f}MB")
# ── Demo ──────────────────────────────────────────────────────────────────────
if __name__ == "__main__":
print("FAISS Demo")
print("="*50)
dim = 384
n = 10_000
rng = np.random.default_rng(0)
vectors = rng.random((n, dim), dtype=np.float32)
faiss.normalize_L2(vectors)
# Build HNSW index
index = create_hnsw_index(dim, M=32)
build_index(index, vectors)
save_index(index, "/tmp/demo.faiss")
index = load_index("/tmp/demo.faiss")
# Search
query_vec = rng.random((1, dim), dtype=np.float32)
faiss.normalize_L2(query_vec)
distances, indices = search(index, query_vec, k=5)
print(f"\nTop-5 neighbors for random query:")
for dist, idx in zip(distances[0], indices[0]):
print(f" id={idx} similarity={dist:.4f}")
# Benchmark
print()
benchmark_index(n_vectors=50_000, dim=dim, n_queries=500, k=10)
For the Qdrant/Weaviate/Pinecone alternative when needing a managed vector database with metadata filtering, hybrid search, cloud scaling, and production SLAs — managed vector DBs handle infrastructure while FAISS delivers 10-100x lower latency for read-heavy workloads by residing fully in-memory, supports GPU acceleration for billion-scale datasets in a single process without network overhead, and the IndexIVFPQ compression achieving 32x smaller footprint makes it the standard choice for embedding layers in recommendation systems, RAG retrieval, and image deduplication pipelines where sub-millisecond per-query latency matters. For the ScaNN / DiskANN alternative when needing higher recall at extreme scale (100M+ vectors) with disk-based storage — DiskANN handles datasets that exceed RAM while FAISS IndexHNSWFlat achieves >97% recall@10 at sub-millisecond latency for in-memory datasets up to ~100M vectors, and the faiss-gpu variant processes 1 billion vectors in under 1 second on a single V100 GPU, making it the go-to choice for GPU-accelerated ANN search. The Claude Skills 360 bundle includes FAISS skill sets covering flat/HNSW/IVF/PQ index creation, training and building, k-NN and range search, GPU acceleration, persistence, document store with ID mapping, and recall benchmarking. Start with the free tier to try vector search code generation.