Sentence Transformers produces dense vector embeddings for semantic similarity and search. pip install sentence-transformers. from sentence_transformers import SentenceTransformer, CrossEncoder, util. Load: model = SentenceTransformer("all-MiniLM-L6-v2") — 384D, fast. Best quality: "all-mpnet-base-v2" (768D). Multilingual: "paraphrase-multilingual-mpnet-base-v2". Encode: embeddings = model.encode(["sentence one", "sentence two"]) — returns (N, D) numpy array. Single: emb = model.encode("text") — shape (D,). Batch options: model.encode(sentences, batch_size=64, show_progress_bar=True, normalize_embeddings=True). Cosine similarity: from sentence_transformers import util, scores = util.pytorch_cos_sim(emb_a, emb_b) — returns (N, M) tensor. Semantic search: hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=10) — returns list of [{corpus_id, score}]. Cross-encoder re-ranking: cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2"), scores = cross_encoder.predict([(query, passage) for passage in passages]). Fine-tuning: from sentence_transformers import SentenceTransformerTrainer, losses, train_loss = losses.MultipleNegativesRankingLoss(model). Training data: from sentence_transformers.training_args import SentenceTransformerTrainingArguments. Paraphrase: "paraphrase-MiniLM-L6-v2" — optimized for duplicate detection. Asymmetric: "msmarco-distilbert-base-v4" for query→document retrieval. Save: model.save("./my-model"), reload: SentenceTransformer("./my-model"). Claude Code generates SBERT embedding pipelines, semantic search backends, cross-encoder re-rankers, and fine-tuning scripts.
CLAUDE.md for Sentence Transformers
## Sentence Transformers Stack
- Version: sentence-transformers >= 3.0
- Fast: all-MiniLM-L6-v2 (384D) | Quality: all-mpnet-base-v2 (768D)
- Encode: model.encode(sentences, batch_size=64, normalize_embeddings=True)
- Similarity: util.pytorch_cos_sim(emb_a, emb_b) → (N, M) tensor
- Search: util.semantic_search(query_emb, corpus_embs, top_k=10)
- Re-rank: CrossEncoder(model).predict([(query, doc), ...]) → scores
- Fine-tune: SentenceTransformerTrainer + MultipleNegativesRankingLoss
- Save/load: model.save(path) | SentenceTransformer(path)
Sentence Transformers Semantic Search Pipeline
# nlp/sentence_transformers_pipeline.py — semantic embeddings and search
from __future__ import annotations
import os
import json
import pickle
from pathlib import Path
from typing import Optional
import numpy as np
import torch
from sentence_transformers import SentenceTransformer, CrossEncoder, util
# ── 1. Model loading ──────────────────────────────────────────────────────────
def load_bi_encoder(
model_name: str = "all-MiniLM-L6-v2",
device: str = None,
) -> SentenceTransformer:
"""
Load a bi-encoder (embedding model).
Recommended models:
- all-MiniLM-L6-v2 — 384D, ~80MB, fastest (best default)
- all-mpnet-base-v2 — 768D, ~420MB, highest quality
- multi-qa-mpnet-base-dot-v1 — 768D, optimized for Q&A retrieval
- paraphrase-multilingual-mpnet-base-v2 — multilingual, 50+ languages
- msmarco-distilbert-base-v4 — asymmetric query→doc search
"""
if device is None:
device = "cuda" if torch.cuda.is_available() else "cpu"
model = SentenceTransformer(model_name, device=device)
print(f"Bi-encoder: {model_name} | dim={model.get_sentence_embedding_dimension()} | {device}")
return model
def load_cross_encoder(
model_name: str = "cross-encoder/ms-marco-MiniLM-L-6-v2",
) -> CrossEncoder:
"""
Load a cross-encoder for re-ranking.
Recommended models:
- cross-encoder/ms-marco-MiniLM-L-6-v2 — fast MS-MARCO re-ranker
- cross-encoder/ms-marco-electra-base — better quality
- cross-encoder/nli-deberta-v3-small — NLI / semantic similarity
- cross-encoder/stsb-roberta-large — STS benchmark state-of-art
"""
model = CrossEncoder(model_name)
print(f"Cross-encoder: {model_name}")
return model
# ── 2. Encoding ───────────────────────────────────────────────────────────────
def encode_sentences(
model: SentenceTransformer,
sentences: list[str],
batch_size: int = 64,
normalize: bool = True,
show_progress: bool = False,
convert_to_tensor: bool = True,
) -> torch.Tensor:
"""
Encode sentences into dense embeddings.
normalize=True enables dot-product as cosine similarity.
Returns (N, D) tensor.
"""
embeddings = model.encode(
sentences,
batch_size=batch_size,
normalize_embeddings=normalize,
show_progress_bar=show_progress,
convert_to_tensor=convert_to_tensor,
)
return embeddings
def encode_single(
model: SentenceTransformer,
text: str,
normalize: bool = True,
) -> torch.Tensor:
"""Encode a single sentence. Returns (D,) tensor."""
return model.encode(text, normalize_embeddings=normalize, convert_to_tensor=True)
# ── 3. Similarity computation ─────────────────────────────────────────────────
def cosine_similarity_matrix(
embeddings_a: torch.Tensor,
embeddings_b: torch.Tensor,
) -> torch.Tensor:
"""Compute (N, M) cosine similarity matrix."""
return util.pytorch_cos_sim(embeddings_a, embeddings_b)
def most_similar(
query_embedding: torch.Tensor,
corpus_embeddings: torch.Tensor,
corpus_texts: list[str],
top_k: int = 5,
) -> list[dict]:
"""Find top-k most similar corpus sentences to a query."""
scores = util.pytorch_cos_sim(query_embedding, corpus_embeddings)[0]
top_results = torch.topk(scores, k=min(top_k, len(corpus_texts)))
return [
{"text": corpus_texts[idx], "score": round(float(score), 4)}
for score, idx in zip(top_results.values, top_results.indices)
]
def find_paraphrases(
sentences: list[str],
model: SentenceTransformer,
threshold: float = 0.85,
) -> list[tuple[str, str, float]]:
"""
Find near-duplicate / paraphrase pairs in a list of sentences.
Returns list of (sent_a, sent_b, score) for pairs above threshold.
"""
embeddings = encode_sentences(model, sentences)
sim_matrix = util.pytorch_cos_sim(embeddings, embeddings)
paraphrases = []
for i in range(len(sentences)):
for j in range(i + 1, len(sentences)):
score = float(sim_matrix[i][j])
if score >= threshold:
paraphrases.append((sentences[i], sentences[j], round(score, 4)))
return sorted(paraphrases, key=lambda x: x[2], reverse=True)
# ── 4. Semantic search ────────────────────────────────────────────────────────
class SemanticSearchIndex:
"""
In-memory semantic search with bi-encoder retrieval
and optional cross-encoder re-ranking.
"""
def __init__(
self,
bi_encoder: SentenceTransformer,
cross_encoder: CrossEncoder = None,
):
self.bi_encoder = bi_encoder
self.cross_encoder = cross_encoder
self.corpus = []
self.embeddings = None
def add_documents(self, documents: list[str]) -> None:
"""Add documents to the search index."""
self.corpus = documents
print(f"Encoding {len(documents)} documents...")
self.embeddings = encode_sentences(
self.bi_encoder, documents, show_progress=True
)
print(f"Index ready: {self.embeddings.shape}")
def search(
self,
query: str,
top_k: int = 10,
rerank_top_k: int = 3,
use_reranker: bool = True,
) -> list[dict]:
"""
Two-stage retrieval:
1. Bi-encoder retrieves top_k candidates (fast)
2. Cross-encoder re-ranks top rerank_top_k (accurate)
"""
if self.embeddings is None:
raise RuntimeError("Index empty — call add_documents() first")
# Stage 1: bi-encoder retrieval
query_emb = encode_single(self.bi_encoder, query)
hits = util.semantic_search(
query_emb, self.embeddings, top_k=top_k
)[0]
results = [
{"text": self.corpus[h["corpus_id"]], "bi_score": round(h["score"], 4)}
for h in hits
]
# Stage 2: cross-encoder re-ranking
if use_reranker and self.cross_encoder and len(results) > 1:
candidates = results[:rerank_top_k]
pairs = [(query, r["text"]) for r in candidates]
ce_scores = self.cross_encoder.predict(pairs)
for r, score in zip(candidates, ce_scores):
r["ce_score"] = round(float(score), 4)
candidates = sorted(candidates, key=lambda x: x.get("ce_score", 0), reverse=True)
results[:rerank_top_k] = candidates
return results
def save(self, path: str) -> None:
"""Save index to disk."""
Path(path).mkdir(parents=True, exist_ok=True)
torch.save(self.embeddings, f"{path}/embeddings.pt")
with open(f"{path}/corpus.json", "w") as f:
json.dump(self.corpus, f)
print(f"Index saved to {path}/")
def load(self, path: str) -> None:
"""Load index from disk."""
self.embeddings = torch.load(f"{path}/embeddings.pt")
with open(f"{path}/corpus.json") as f:
self.corpus = json.load(f)
print(f"Index loaded: {len(self.corpus)} docs, {self.embeddings.shape[1]}D")
# ── 5. Clustering ─────────────────────────────────────────────────────────────
def cluster_sentences(
model: SentenceTransformer,
sentences: list[str],
threshold: float = 0.75,
min_community_size: int = 2,
) -> list[list[int]]:
"""
Community detection clustering (no fixed cluster count).
Returns list of clusters, each a list of sentence indices.
"""
embeddings = encode_sentences(model, sentences)
clusters = util.community_detection(
embeddings,
min_community_size=min_community_size,
threshold=threshold,
)
return clusters
def show_clusters(
sentences: list[str],
clusters: list[list[int]],
max_show: int = 3,
) -> None:
"""Print cluster contents."""
print(f"Found {len(clusters)} clusters:")
for i, cluster in enumerate(clusters):
print(f"\nCluster {i+1} ({len(cluster)} sentences):")
for idx in cluster[:max_show]:
print(f" [{idx}] {sentences[idx][:100]}")
if len(cluster) > max_show:
print(f" ... and {len(cluster) - max_show} more")
# ── 6. Fine-tuning ────────────────────────────────────────────────────────────
def fine_tune_model(
model_name: str = "all-MiniLM-L6-v2",
train_pairs: list[tuple] = None, # List of (sent1, sent2, score) or (sent1, sent2) for MNRL
output_dir: str = "./fine-tuned-sbert",
loss_type: str = "mnrl", # "mnrl" | "cosine"
epochs: int = 3,
batch_size: int = 16,
warmup_steps: int = 100,
) -> SentenceTransformer:
"""
Fine-tune a sentence transformer model.
loss_type:
- "mnrl" (MultipleNegativesRankingLoss): pairs of (anchor, positive) — best for retrieval
- "cosine" (CosineSimilarityLoss): triplets of (sent1, sent2, score 0-1) — for STS
"""
from sentence_transformers import SentenceTransformerTrainer
from sentence_transformers.losses import MultipleNegativesRankingLoss, CosineSimilarityLoss
from sentence_transformers.training_args import SentenceTransformerTrainingArguments
from datasets import Dataset
model = SentenceTransformer(model_name)
if loss_type == "mnrl":
loss = MultipleNegativesRankingLoss(model)
train_dataset = Dataset.from_dict({
"anchor": [p[0] for p in train_pairs],
"positive": [p[1] for p in train_pairs],
})
else:
loss = CosineSimilarityLoss(model)
train_dataset = Dataset.from_dict({
"sentence1": [p[0] for p in train_pairs],
"sentence2": [p[1] for p in train_pairs],
"score": [float(p[2]) for p in train_pairs],
})
args = SentenceTransformerTrainingArguments(
output_dir=output_dir,
num_train_epochs=epochs,
per_device_train_batch_size=batch_size,
warmup_steps=warmup_steps,
fp16=torch.cuda.is_available(),
save_strategy="epoch",
logging_steps=50,
)
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
loss=loss,
)
trainer.train()
model.save(output_dir)
print(f"Fine-tuned model saved: {output_dir}")
return model
# ── 7. Batch semantic similarity scoring ─────────────────────────────────────
def score_pairs(
model: SentenceTransformer,
pairs: list[tuple[str, str]],
batch_size: int = 64,
) -> list[float]:
"""
Score sentence pairs for semantic similarity.
Returns cosine similarity scores in [-1, 1].
"""
sents_a = [p[0] for p in pairs]
sents_b = [p[1] for p in pairs]
embs_a = encode_sentences(model, sents_a, batch_size=batch_size, normalize=True)
embs_b = encode_sentences(model, sents_b, batch_size=batch_size, normalize=True)
# Element-wise dot product (= cosine for normalized)
scores = (embs_a * embs_b).sum(dim=1)
return scores.tolist()
# ── Demo ──────────────────────────────────────────────────────────────────────
if __name__ == "__main__":
print("Sentence Transformers Demo")
print("="*50)
model = load_bi_encoder("all-MiniLM-L6-v2")
# Corpus
corpus = [
"How do I install Python packages?",
"What is the best way to handle errors in Python?",
"How to use pip to install libraries?",
"Python exception handling with try/except",
"How to create a virtual environment in Python?",
"Understanding async/await in JavaScript",
"What are JavaScript Promises?",
"How to fetch data from an API in React?",
"React hooks tutorial for beginners",
"CSS flexbox layout guide",
]
# Build search index
index = SemanticSearchIndex(bi_encoder=model)
index.add_documents(corpus)
# Search
queries = ["installing Python dependencies", "asynchronous JavaScript"]
for query in queries:
results = index.search(query, top_k=3, use_reranker=False)
print(f"\nQuery: {query}")
for r in results:
print(f" [{r['bi_score']:.3f}] {r['text']}")
# Paraphrase detection
print("\nParaphrase pairs (threshold=0.85):")
pairs = find_paraphrases(corpus, model, threshold=0.85)
for a, b, score in pairs[:3]:
print(f" [{score:.3f}] '{a[:50]}' ↔ '{b[:50]}'")
# Clustering
clusters = cluster_sentences(model, corpus, threshold=0.6, min_community_size=2)
show_clusters(corpus, clusters)
For the OpenAI text-embedding-3 alternative when needing the highest available embedding quality for production semantic search with managed infrastructure and no GPU hosting — OpenAI embeddings provide strong out-of-box performance while Sentence Transformers’ local inference eliminates per-call API costs (critical at millions of embeddings), enables custom fine-tuning on domain-specific data with MultipleNegativesRankingLoss, and the bi-encoder + cross-encoder two-stage pipeline gives better retrieval precision than single-model approaches at the same latency budget. For the spaCy vectors alternative when embedding short phrases for NER pipeline features or small-vocabulary similarity — spaCy’s static word vectors are faster but Sentence Transformers’ contextual BERT-based embeddings handle paraphrasing and semantic equivalence far better (“cheap” ≈ “inexpensive” without any training), making it the standard for semantic search, duplicate detection, and RAG retrieval layers where meaning matters more than lexical overlap. The Claude Skills 360 bundle includes Sentence Transformers skill sets covering bi-encoder loading, batch encoding, semantic search indexing, cross-encoder re-ranking, paraphrase detection, community clustering, similarity scoring, save/load, and fine-tuning with MNRL and cosine loss. Start with the free tier to try semantic embedding code generation.