Blog / AI / Claude Code for LlamaIndex: Production RAG Pipelines

Claude Code for LlamaIndex: Production RAG Pipelines

Published: December 21, 2026

•

Read time: 10 min read

•

By: Claude Skills 360

LlamaIndex orchestrates the full RAG pipeline: document ingestion, chunking, embedding, vector storage, retrieval, and response synthesis. The high-level VectorStoreIndex handles the happy path. Custom NodeParser controls chunking strategy. QueryEngine composes retrieval with response synthesis. RouterQueryEngine routes questions to the right index. RAGAS evaluates faithfulness and answer relevance objectively. Claude Code generates LlamaIndex ingestion pipelines, custom retrievers, evaluation harnesses, and the production configurations that ship reliable RAG applications.

CLAUDE.md for LlamaIndex Projects

## LlamaIndex Stack
- Version: llama-index-core >= 0.11, llama-index-llms-anthropic >= 0.4
- LLM: Claude claude-sonnet-4-6 (synthesis), claude-haiku-4-5-20251001 (classification)
- Embeddings: text-embedding-3-large (OpenAI) or voyage-3 (Anthropic Voyage)
- Vector store: Pinecone (production), ChromaDB (local dev)
- Chunking: SentenceSplitter, chunk_size=512, overlap=50
- Retrieval: top_k=5, similarity_threshold=0.75
- Evaluation: RAGAS with faithfulness + answer_relevancy metrics
- Persist: docstore + index store to disk for fast restarts

Document Ingestion Pipeline

# ingest/pipeline.py — document ingestion with metadata
from llama_index.core import (
    VectorStoreIndex,
    StorageContext,
    Settings,
    Document,
)
from llama_index.core.ingestion import IngestionPipeline, IngestionCache
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.extractors import (
    TitleExtractor,
    QuestionsAnsweredExtractor,
    SummaryExtractor,
)
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.anthropic import Anthropic
from llama_index.vector_stores.pinecone import PineconeVectorStore
from pinecone import Pinecone
import hashlib

# Configure global settings
Settings.llm = Anthropic(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    temperature=0.1,
)
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-large",
    dimensions=1536,
)

def build_ingestion_pipeline(vector_store: PineconeVectorStore) -> IngestionPipeline:
    """Build ingestion pipeline with caching to skip already-processed docs."""
    cache = IngestionCache(
        cache=SimpleDocumentStore.from_persist_dir("./cache"),
        collection="ingestion_cache",
    )

    return IngestionPipeline(
        transformations=[
            # 1. Split into chunks
            SentenceSplitter(
                chunk_size=512,
                chunk_overlap=50,
                paragraph_separator="\n\n",
            ),
            # 2. Extract metadata (uses LLM — add cost)
            TitleExtractor(nodes=5),
            QuestionsAnsweredExtractor(questions=3),
            # 3. Generate embeddings
            Settings.embed_model,
        ],
        vector_store=vector_store,
        cache=cache,
    )


def ingest_documents(docs: list[Document], pipeline: IngestionPipeline) -> list:
    """Ingest documents, deduplicating by content hash."""
    # Add content hash for deduplication
    for doc in docs:
        content_hash = hashlib.md5(doc.text.encode()).hexdigest()
        doc.metadata["content_hash"] = content_hash
        doc.doc_id = content_hash  # Use hash as stable ID

    nodes = pipeline.run(documents=docs, show_progress=True)
    print(f"Ingested {len(nodes)} nodes from {len(docs)} documents")
    return nodes


# Usage
def ingest_knowledge_base(file_paths: list[str]):
    from llama_index.readers.file import PDFReader, DocxReader, MarkdownReader

    pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
    index_name = "knowledge-base"

    if index_name not in [i.name for i in pc.list_indexes()]:
        pc.create_index(
            name=index_name,
            dimension=1536,
            metric="cosine",
            spec=ServerlessSpec(cloud="aws", region="us-east-1"),
        )

    vector_store = PineconeVectorStore(pinecone_index=pc.Index(index_name))
    pipeline = build_ingestion_pipeline(vector_store)

    documents = []
    for path in file_paths:
        if path.endswith(".pdf"):
            docs = PDFReader().load_data(path)
        elif path.endswith(".md"):
            docs = MarkdownReader().load_data(path)
        else:
            docs = SimpleDirectoryReader(input_files=[path]).load_data()

        # Add source metadata
        for doc in docs:
            doc.metadata.update({
                "source": path,
                "file_type": path.split(".")[-1],
            })
        documents.extend(docs)

    return ingest_documents(documents, pipeline)

Query Engine Setup

# query/engine.py — query engine with custom retrieval
from llama_index.core import VectorStoreIndex, QueryBundle
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import (
    SimilarityPostprocessor,
    KeywordNodePostprocessor,
    LLMRerank,
)
from llama_index.core.response_synthesizers import get_response_synthesizer

def build_query_engine(
    index: VectorStoreIndex,
    top_k: int = 5,
    rerank: bool = True,
) -> RetrieverQueryEngine:
    """Build query engine with retrieval, reranking, and synthesis."""

    # Retriever
    retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=top_k * 2 if rerank else top_k,  # Fetch more for reranking
    )

    # Post-processors
    postprocessors = [
        # Filter out low-relevance nodes
        SimilarityPostprocessor(similarity_cutoff=0.7),
    ]

    if rerank:
        # LLM-based reranking for better precision
        postprocessors.append(
            LLMRerank(
                choice_batch_size=5,
                top_n=top_k,
            )
        )

    # Response synthesizer
    synthesizer = get_response_synthesizer(
        response_mode="tree_summarize",  # Best for multi-document synthesis
        use_async=True,
        verbose=False,
    )

    return RetrieverQueryEngine(
        retriever=retriever,
        response_synthesizer=synthesizer,
        node_postprocessors=postprocessors,
    )


# Streaming response
async def query_streaming(engine: RetrieverQueryEngine, question: str):
    """Stream the response token by token."""
    streaming_engine = engine.as_query_engine(streaming=True)
    response = streaming_engine.query(question)

    for token in response.response_gen:
        yield token

    # Access source documents
    print("\n\nSources:")
    for node in response.source_nodes:
        print(f"  [{node.score:.2f}] {node.metadata.get('source', 'Unknown')}")
        print(f"  {node.text[:200]}...")

Router Query Engine for Multi-Index

# query/router.py — route questions to the right index
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.tools import QueryEngineTool

def build_router_engine(
    product_engine: RetrieverQueryEngine,
    policy_engine: RetrieverQueryEngine,
    technical_engine: RetrieverQueryEngine,
) -> RouterQueryEngine:
    """Route queries to the most relevant index."""

    tools = [
        QueryEngineTool.from_defaults(
            query_engine=product_engine,
            description=(
                "Useful for answering questions about product features, "
                "pricing, availability, and specifications."
            ),
        ),
        QueryEngineTool.from_defaults(
            query_engine=policy_engine,
            description=(
                "Useful for answering questions about company policies, "
                "return policies, shipping, warranties, and legal terms."
            ),
        ),
        QueryEngineTool.from_defaults(
            query_engine=technical_engine,
            description=(
                "Useful for answering technical questions, API documentation, "
                "integration guides, and developer resources."
            ),
        ),
    ]

    return RouterQueryEngine(
        selector=LLMSingleSelector.from_defaults(),
        query_engine_tools=tools,
    )

Hybrid Search

# query/hybrid.py — combine dense and sparse (BM25) search
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core.node_parser import SentenceSplitter

def build_hybrid_retriever(
    index: VectorStoreIndex,
    nodes: list,
    top_k: int = 5,
) -> QueryFusionRetriever:
    """Combine semantic (vector) and lexical (BM25) search."""

    vector_retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=top_k,
    )

    bm25_retriever = BM25Retriever.from_defaults(
        nodes=nodes,
        similarity_top_k=top_k,
    )

    return QueryFusionRetriever(
        retrievers=[vector_retriever, bm25_retriever],
        similarity_top_k=top_k,
        num_queries=1,   # No query expansion (set >1 for HyDE)
        mode="reciprocal_rerank",  # RRF fusion
        use_async=True,
        verbose=False,
    )

RAG Evaluation with RAGAS

# eval/evaluate.py — evaluate RAG quality with RAGAS
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall,
)
from ragas.integrations.llama_index import evaluate as ragas_evaluate
from datasets import Dataset

def evaluate_rag_pipeline(
    query_engine: RetrieverQueryEngine,
    test_questions: list[str],
    ground_truth_answers: list[str],
) -> dict:
    """Evaluate RAG pipeline on test dataset."""

    # Collect responses and contexts
    results = []
    for question, ground_truth in zip(test_questions, ground_truth_answers):
        response = query_engine.query(question)
        results.append({
            "question": question,
            "answer": str(response),
            "contexts": [node.text for node in response.source_nodes],
            "ground_truth": ground_truth,
        })

    # Create dataset for RAGAS
    dataset = Dataset.from_list(results)

    # Evaluate with multiple metrics
    scores = evaluate(
        dataset=dataset,
        metrics=[
            faithfulness,       # Is the answer grounded in retrieved context?
            answer_relevancy,   # Is the answer relevant to the question?
            context_precision,  # Is retrieved context precise?
            context_recall,     # Is all necessary info retrieved?
        ],
    )

    print(f"Faithfulness: {scores['faithfulness']:.3f}")
    print(f"Answer Relevancy: {scores['answer_relevancy']:.3f}")
    print(f"Context Precision: {scores['context_precision']:.3f}")
    print(f"Context Recall: {scores['context_recall']:.3f}")

    return scores.to_pandas().to_dict()


# Run evaluation
TEST_QUESTIONS = [
    "What is your return policy for electronics?",
    "How do I integrate with the REST API?",
    "What payment methods do you accept?",
]

GROUND_TRUTH = [
    "Electronics can be returned within 30 days with original packaging.",
    "The REST API uses OAuth 2.0 and returns JSON responses.",
    "We accept Visa, Mastercard, Amex, and PayPal.",
]

results = evaluate_rag_pipeline(query_engine, TEST_QUESTIONS, GROUND_TRUTH)

FastAPI Production Integration

# api/main.py — FastAPI service exposing RAG query
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
import asyncio

app = FastAPI(title="RAG API")

class QueryRequest(BaseModel):
    question: str
    stream: bool = False
    top_k: int = 5

class QueryResponse(BaseModel):
    answer: str
    sources: list[dict]
    latency_ms: float

# Initialize query engine at startup
@app.on_event("startup")
async def load_index():
    app.state.engine = await build_query_engine_async()

@app.post("/query", response_model=QueryResponse)
async def query(req: QueryRequest):
    import time
    start = time.time()

    response = await app.state.engine.aquery(req.question)

    return QueryResponse(
        answer=str(response),
        sources=[
            {
                "text": node.text[:500],
                "source": node.metadata.get("source"),
                "score": node.score,
            }
            for node in response.source_nodes
        ],
        latency_ms=(time.time() - start) * 1000,
    )

@app.post("/query/stream")
async def query_stream(req: QueryRequest):
    async def generate():
        async for token in query_streaming(app.state.engine, req.question):
            yield f"data: {token}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream")

For the AWS Bedrock alternative for enterprise RAG with Claude using managed Knowledge Bases, see the AWS Bedrock guide for fully managed RAG infrastructure. For the vector database backends that LlamaIndex integrates with, the vector databases guide covers Pinecone, Weaviate, and pgvector. The Claude Skills 360 bundle includes LlamaIndex skill sets covering ingestion pipelines, hybrid search, and RAGAS evaluation. Start with the free tier to try RAG pipeline generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39