Blog / AI / Claude Code for pikepdf: PDF Manipulation in Python

Claude Code for pikepdf: PDF Manipulation in Python

Published: April 28, 2028

•

Read time: 5 min read

•

By: Claude Skills 360

pikepdf reads and writes PDF files via libqpdf. pip install pikepdf. Open: import pikepdf; pdf = pikepdf.open("in.pdf"). Save: pdf.save("out.pdf"). Bytes: import io; buf = io.BytesIO(); pdf.save(buf); buf.getvalue(). Page count: len(pdf.pages). Get page: page = pdf.pages[0]. Delete page: del pdf.pages[2]. Reorder: pdf.pages[0], pdf.pages[1] = pdf.pages[1], pdf.pages[0]. Merge: new = pikepdf.Pdf.new(); new.pages.extend(p.pages); .... Split: for i, page in enumerate(pdf.pages): out = pikepdf.Pdf.new(); out.pages.append(page); out.save(f"{i}.pdf"). Rotate: page.rotate(90, relative=True). Metadata: pdf.docinfo["/Title"] = "My Doc". meta = pdf.open_metadata(). XMP: with pdf.open_metadata() as meta: meta["dc:title"] = "...". Encrypt: pdf.save("enc.pdf", encryption=pikepdf.Encryption(owner="ownerpass", user="userpass", R=6)). Decrypt: pikepdf.open("enc.pdf", password="userpass"). Extract images: from pikepdf import PdfImage; img = PdfImage(page.images["/Im0"]); img.as_pil_image().save("out.png"). Compress: pdf.save("out.pdf", compress_streams=True, recompress_flate=True). Copy pages: dst.pages.append(src.pages[0]). Remove annots: del page["/Annots"]. Claude Code generates pikepdf merge/split utilities, watermark pipelines, and PDF metadata editors.

CLAUDE.md for pikepdf

## pikepdf Stack
- Version: pikepdf >= 8 | pip install pikepdf
- Open: pdf = pikepdf.open("in.pdf") | with pikepdf.open("in.pdf") as pdf:
- Save: pdf.save("out.pdf") | pdf.save(BytesIO()) for in-memory
- Pages: pdf.pages[i] | del pdf.pages[i] | pages.append/extend
- Merge: new = Pdf.new(); new.pages.extend(p.pages for p in pdfs)
- Encrypt: pdf.save(path, encryption=pikepdf.Encryption(owner=pw, R=6))

pikepdf PDF Manipulation Pipeline

# app/pdf_edit.py — pikepdf merge, split, watermark, encrypt, extract, and optimize
from __future__ import annotations

import io
from pathlib import Path
from typing import Any

import pikepdf
from pikepdf import Pdf, PdfImage, Encryption


# ─────────────────────────────────────────────────────────────────────────────
# 1. Open / save helpers
# ─────────────────────────────────────────────────────────────────────────────

def open_pdf(source: str | Path | bytes) -> Pdf:
    """
    Open a PDF from a file path or raw bytes.
    Returns a pikepdf.Pdf; caller should close() or use as context manager.
    """
    if isinstance(source, bytes):
        return pikepdf.open(io.BytesIO(source))
    return pikepdf.open(str(source))


def save_bytes(pdf: Pdf, compress: bool = True) -> bytes:
    """
    Serialize a PDF to bytes.
    compress=True: streams are flate-compressed for smaller output.
    """
    buf = io.BytesIO()
    pdf.save(
        buf,
        compress_streams=compress,
        recompress_flate=compress,
        object_stream_mode=pikepdf.ObjectStreamMode.generate if compress else pikepdf.ObjectStreamMode.disable,
    )
    return buf.getvalue()


def page_count(source: str | Path | bytes) -> int:
    """Return the number of pages without loading the full document."""
    with open_pdf(source) as pdf:
        return len(pdf.pages)


# ─────────────────────────────────────────────────────────────────────────────
# 2. Merge and split
# ─────────────────────────────────────────────────────────────────────────────

def merge_pdfs(
    sources: list[str | Path | bytes],
    output_path: str | Path | None = None,
) -> bytes:
    """
    Merge multiple PDF files into one document.
    Returns the merged PDF as bytes; also writes to output_path if provided.

    Example:
        pdf_bytes = merge_pdfs(["cover.pdf", "content.pdf", "appendix.pdf"])
    """
    merged = Pdf.new()
    for source in sources:
        with open_pdf(source) as src:
            merged.pages.extend(src.pages)

    data = save_bytes(merged)
    if output_path:
        Path(output_path).write_bytes(data)
    return data


def split_pdf(
    source: str | Path | bytes,
    output_dir: str | Path,
    prefix: str = "page",
) -> list[Path]:
    """
    Split a PDF into one file per page.
    Returns list of created file paths.
    """
    out = Path(output_dir)
    out.mkdir(parents=True, exist_ok=True)
    paths = []
    with open_pdf(source) as src:
        for i, page in enumerate(src.pages):
            single = Pdf.new()
            single.pages.append(page)
            p = out / f"{prefix}_{i + 1:04d}.pdf"
            single.save(str(p))
            paths.append(p)
    return paths


def extract_pages(
    source: str | Path | bytes,
    page_numbers: list[int],
    output_path: str | Path | None = None,
) -> bytes:
    """
    Extract specific pages (0-indexed) from a PDF.
    Example: extract_pages("doc.pdf", [0, 2, 4])  → pages 1, 3, 5
    """
    with open_pdf(source) as src:
        out = Pdf.new()
        for n in page_numbers:
            out.pages.append(src.pages[n])
        data = save_bytes(out)

    if output_path:
        Path(output_path).write_bytes(data)
    return data


def rotate_pages(
    source: str | Path | bytes,
    degrees: int,
    page_numbers: list[int] | None = None,
) -> bytes:
    """
    Rotate pages by degrees (90, 180, 270).
    page_numbers: 0-indexed list; None = all pages.
    """
    with open_pdf(source) as pdf:
        targets = page_numbers if page_numbers is not None else range(len(pdf.pages))
        for i in targets:
            pdf.pages[i].rotate(degrees, relative=True)
        return save_bytes(pdf)


# ─────────────────────────────────────────────────────────────────────────────
# 3. Metadata
# ─────────────────────────────────────────────────────────────────────────────

def get_metadata(source: str | Path | bytes) -> dict[str, str]:
    """
    Return PDF metadata (title, author, subject, creator, producer, dates).
    """
    with open_pdf(source) as pdf:
        info = {
            "title":    str(pdf.docinfo.get("/Title",    "")),
            "author":   str(pdf.docinfo.get("/Author",   "")),
            "subject":  str(pdf.docinfo.get("/Subject",  "")),
            "creator":  str(pdf.docinfo.get("/Creator",  "")),
            "producer": str(pdf.docinfo.get("/Producer", "")),
            "page_count": len(pdf.pages),
        }
        return info


def set_metadata(
    source: str | Path | bytes,
    title: str = "",
    author: str = "",
    subject: str = "",
    keywords: str = "",
) -> bytes:
    """
    Set PDF metadata fields and return the updated PDF bytes.
    """
    with open_pdf(source) as pdf:
        with pdf.open_metadata() as meta:
            if title:
                meta["dc:title"]   = title
            if author:
                meta["dc:creator"] = [author]
            if subject:
                meta["dc:description"] = subject
            if keywords:
                meta["pdf:Keywords"] = keywords

        # Also update legacy docinfo
        if title:   pdf.docinfo["/Title"]   = title
        if author:  pdf.docinfo["/Author"]  = author
        if subject: pdf.docinfo["/Subject"] = subject

        return save_bytes(pdf)


# ─────────────────────────────────────────────────────────────────────────────
# 4. Encryption
# ─────────────────────────────────────────────────────────────────────────────

def encrypt_pdf(
    source: str | Path | bytes,
    user_password: str,
    owner_password: str | None = None,
    allow_printing: bool = True,
    allow_copying: bool = False,
) -> bytes:
    """
    Encrypt a PDF with AES-256 (R=6).
    user_password:  required to open the file.
    owner_password: required to change permissions (defaults to user_password).
    """
    owner = owner_password or user_password

    allow = pikepdf.Permissions(
        print_lowres=allow_printing,
        print_highres=allow_printing,
        extract=allow_copying,
        modify_other=False,
        modify_annotation=False,
        modify_form=False,
        modify_assembly=False,
        accessibility=True,
    )

    enc = Encryption(owner=owner, user=user_password, R=6, allow=allow)

    with open_pdf(source) as pdf:
        buf = io.BytesIO()
        pdf.save(buf, encryption=enc)
        return buf.getvalue()


def decrypt_pdf(source: str | Path | bytes, password: str) -> bytes:
    """Open an encrypted PDF and return an unencrypted copy as bytes."""
    pdf = pikepdf.open(
        io.BytesIO(source) if isinstance(source, bytes) else str(source),
        password=password,
    )
    return save_bytes(pdf)


# ─────────────────────────────────────────────────────────────────────────────
# 5. Image extraction
# ─────────────────────────────────────────────────────────────────────────────

def extract_images(
    source: str | Path | bytes,
    output_dir: str | Path,
    page_numbers: list[int] | None = None,
) -> list[Path]:
    """
    Extract all images from the PDF and save them to output_dir.
    Returns list of saved image paths.
    Requires: pip install pillow
    """
    out = Path(output_dir)
    out.mkdir(parents=True, exist_ok=True)
    saved: list[Path] = []

    with open_pdf(source) as pdf:
        pages = (
            [pdf.pages[i] for i in page_numbers]
            if page_numbers
            else list(pdf.pages)
        )
        img_idx = 0
        for page_idx, page in enumerate(pages):
            for name, raw in page.images.items():
                try:
                    pdfimg = PdfImage(raw)
                    pil    = pdfimg.as_pil_image()
                    ext    = pil.format.lower() if pil.format else "png"
                    path   = out / f"page{page_idx + 1}_{img_idx:04d}.{ext}"
                    pil.save(str(path))
                    saved.append(path)
                    img_idx += 1
                except Exception:
                    pass  # skip unextractable images

    return saved


# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    from pathlib import Path

    # Create a tiny test PDF with two pages using fpdf2
    try:
        from fpdf import FPDF
        def _make_test_pdf(pages: int = 3) -> bytes:
            pdf = FPDF()
            for i in range(1, pages + 1):
                pdf.add_page()
                pdf.set_font("Helvetica", "B", 24)
                pdf.cell(0, 20, f"Page {i}", align="C")
            return pdf.output()

        src = _make_test_pdf(4)
        Path("/tmp/test_src.pdf").write_bytes(src)
        print(f"Test PDF: {len(src):,} bytes, {page_count(src)} pages")

        print("\n=== Metadata ===")
        info = get_metadata(src)
        print(info)

        print("\n=== Set metadata ===")
        updated = set_metadata(src, title="My Document", author="Alice", subject="Test")
        print(f"Updated: {len(updated):,} bytes")

        print("\n=== Extract pages [0, 2] ===")
        extracted = extract_pages(src, [0, 2])
        print(f"Extracted: {page_count(extracted)} pages")

        print("\n=== Rotate page 0 by 90° ===")
        rotated = rotate_pages(src, 90, [0])
        print(f"Rotated: {page_count(rotated)} pages")

        print("\n=== Encrypt + decrypt ===")
        enc_pdf = encrypt_pdf(src, user_password="secret", allow_printing=True)
        print(f"Encrypted: {len(enc_pdf):,} bytes")
        dec_pdf = decrypt_pdf(enc_pdf, password="secret")
        print(f"Decrypted: {len(dec_pdf):,} bytes, {page_count(dec_pdf)} pages")

    except ImportError:
        print("fpdf2 not installed — skipping demo. pip install fpdf2 pikepdf")

For the PyPDF2 / pypdf alternative — pypdf (formerly PyPDF2) is a pure-Python PDF library with a similar merge/split API and no native dependencies; pikepdf uses QPDF under the hood giving it superior handling of malformed PDFs, reliable encryption/decryption with AES-256, and image extraction via PdfImage.as_pil_image() — pikepdf is the right choice when correctness and encryption matter. For the pdfplumber / pdfminer alternative — pdfplumber and pdfminer are specialized for text and table extraction from existing PDFs (they parse text layers and detect columns); pikepdf operates at the PDF object/page level for structural operations like merge, split, rotate, watermark, and metadata — use pdfplumber when you need the text content, pikepdf when you need to manipulate the document structure. The Claude Skills 360 bundle includes pikepdf skill sets covering open_pdf()/save_bytes() helpers, merge_pdfs() multi-document combiner, split_pdf() page splitter, extract_pages() page range extractor, rotate_pages(), get_metadata()/set_metadata() with XMP and docinfo, encrypt_pdf() AES-256 with permissions, decrypt_pdf(), and extract_images() Pillow pipeline. Start with the free tier to try PDF manipulation code generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39