Blog / AI / Claude Code for xml.sax.handler: Python SAX2 Handler Base Classes

Claude Code for xml.sax.handler: Python SAX2 Handler Base Classes

Published: February 9, 2029

•

Read time: 5 min read

•

By: Claude Skills 360

Python’s xml.sax.handler module defines the base handler classes for SAX2 XML parsing. from xml.sax import handler. Four handler interfaces: ContentHandler — the main handler; override startElement(name, attrs), endElement(name), characters(content), startDocument(), endDocument(), startPrefixMapping(prefix, uri), endPrefixMapping(prefix), ignorableWhitespace(whitespace), processingInstruction(target, data). ErrorHandler — override warning(exc), error(exc), fatalError(exc); default error() and fatalError() re-raise. EntityResolver — override resolveEntity(publicId, systemId) → InputSource; default returns None (expat blocks external entities). DTDHandler — override notationDecl(name, publicId, systemId) and unparsedEntityDecl(name, publicId, systemId, ndata). Feature constants: handler.feature_namespaces — enable namespace processing; handler.feature_validation — DTD validation. Property constants: handler.property_lexical_handler — register a LexicalHandler for comments and CDATA events. Register handlers with parser.setContentHandler(h), parser.setErrorHandler(e). Claude Code generates content accumulator handlers, streaming element collectors, schema-like structure validators, event loggers, and multi-pass XML pipeline stages.

CLAUDE.md for xml.sax.handler

## xml.sax.handler Stack
- Stdlib: import xml.sax
-         from xml.sax import handler, parseString, parse
- Content: class MyHandler(handler.ContentHandler):
-              def startElement(self, name, attrs): ...
-              def endElement(self, name): ...
-              def characters(self, content): ...
- Error:   class MyErrors(handler.ErrorHandler):
-              def fatalError(self, exc): raise exc
- Parse:   p = xml.sax.make_parser()
-           p.setContentHandler(MyHandler())
-           p.setErrorHandler(MyErrors())
-           p.setFeature(handler.feature_namespaces, True)
-           p.parse(io.BytesIO(xml_bytes))

xml.sax.handler SAX2 Pipeline

# app/xmlsaxhandlerutil.py — collect, count, validate, log, locate, namespace
from __future__ import annotations

import io
import xml.sax
import xml.sax.handler as _handler
from dataclasses import dataclass, field
from typing import Any


# ─────────────────────────────────────────────────────────────────────────────
# 1. Text-accumulator ContentHandler
# ─────────────────────────────────────────────────────────────────────────────

class TextAccumulator(_handler.ContentHandler):
    """
    Collect the text content of all (or specific) elements.
    After parsing: .results is a list of (tag, text) tuples.

    Example:
        acc = TextAccumulator(target_tags={"title", "author"})
        xml.sax.parseString(xml_bytes, acc)
        for tag, text in acc.results:
            print(tag, text)
    """

    def __init__(self, target_tags: "set[str] | None" = None) -> None:
        super().__init__()
        self.target_tags = target_tags
        self.results: list[tuple[str, str]] = []
        self._stack: list[tuple[str, list[str]]] = []
        self._active = False

    def startElement(self, name: str, attrs: Any) -> None:
        if self.target_tags is None or name in self.target_tags:
            self._stack.append((name, []))
            self._active = True
        elif self._active:
            self._stack.append((name, []))

    def characters(self, content: str) -> None:
        if self._stack:
            self._stack[-1][1].append(content)

    def endElement(self, name: str) -> None:
        if not self._stack:
            return
        top_name, parts = self._stack[-1]
        if top_name == name:
            self._stack.pop()
            if self.target_tags is None or name in self.target_tags:
                self.results.append((name, "".join(parts).strip()))
            self._active = bool(self._stack)


# ─────────────────────────────────────────────────────────────────────────────
# 2. Attribute collector
# ─────────────────────────────────────────────────────────────────────────────

@dataclass
class ElementRecord:
    tag:    str
    attrs:  dict[str, str]
    depth:  int


class AttributeCollector(_handler.ContentHandler):
    """
    Collect all elements with their attributes.

    Example:
        col = AttributeCollector(target_tag="book")
        xml.sax.parseString(xml_bytes, col)
        for rec in col.records:
            print(rec.attrs)
    """

    def __init__(self, target_tag: str | None = None) -> None:
        super().__init__()
        self.target_tag = target_tag
        self.records: list[ElementRecord] = []
        self._depth = 0

    def startElement(self, name: str, attrs: Any) -> None:
        self._depth += 1
        if self.target_tag is None or name == self.target_tag:
            self.records.append(ElementRecord(
                tag=name,
                attrs={k: v for k, v in attrs.items()},
                depth=self._depth,
            ))

    def endElement(self, name: str) -> None:
        self._depth -= 1


# ─────────────────────────────────────────────────────────────────────────────
# 3. Structure validator
# ─────────────────────────────────────────────────────────────────────────────

@dataclass
class StructureReport:
    valid:               bool
    element_count:       int
    max_depth:           int
    found_tags:          list[str]
    missing_required:    list[str]
    errors:              list[str] = field(default_factory=list)


class StructureValidator(_handler.ContentHandler, _handler.ErrorHandler):
    """
    Validate that required elements are present and document is well-formed.

    Example:
        val = StructureValidator(required_tags=["title", "author"])
        xml.sax.parseString(xml_bytes, val)
        print(val.report)
    """

    def __init__(self, required_tags: "list[str] | None" = None) -> None:
        super().__init__()
        self.required = set(required_tags or [])
        self._found: set[str] = set()
        self._depth = 0
        self._max_depth = 0
        self._count = 0
        self._errors: list[str] = []

    def startElement(self, name: str, attrs: Any) -> None:
        self._depth += 1
        self._max_depth = max(self._max_depth, self._depth)
        self._count += 1
        self._found.add(name)

    def endElement(self, name: str) -> None:
        self._depth -= 1

    def warning(self, exc: Exception) -> None:
        self._errors.append(f"warning: {exc}")

    def error(self, exc: Exception) -> None:
        self._errors.append(f"error: {exc}")

    def fatalError(self, exc: Exception) -> None:
        self._errors.append(f"fatal: {exc}")
        raise exc

    @property
    def report(self) -> StructureReport:
        missing = sorted(self.required - self._found)
        return StructureReport(
            valid=not self._errors and not missing,
            element_count=self._count,
            max_depth=self._max_depth,
            found_tags=sorted(self._found),
            missing_required=missing,
            errors=self._errors,
        )


# ─────────────────────────────────────────────────────────────────────────────
# 4. Namespace-aware handler
# ─────────────────────────────────────────────────────────────────────────────

class NamespaceLogger(_handler.ContentHandler):
    """
    Track namespace prefix mappings and log namespace-qualified element starts.

    Example:
        ns_log = NamespaceLogger()
        p = xml.sax.make_parser()
        p.setFeature(handler.feature_namespaces, True)
        p.setContentHandler(ns_log)
        p.parse(io.BytesIO(xml_bytes))
        print(ns_log.ns_map)
        print(ns_log.qualified_elements[:5])
    """

    def __init__(self) -> None:
        super().__init__()
        self.ns_map: dict[str, str] = {}
        self.qualified_elements: list[str] = []

    def startPrefixMapping(self, prefix: str, uri: str) -> None:
        self.ns_map[prefix or "(default)"] = uri

    def startElementNS(self, name: Any, qname: Any, attrs: Any) -> None:
        ns_uri, local = name if isinstance(name, tuple) else (None, name)
        if ns_uri:
            self.qualified_elements.append(f"{{{ns_uri}}}{local}")
        else:
            self.qualified_elements.append(str(local))


# ─────────────────────────────────────────────────────────────────────────────
# 5. Convenience parse wrappers
# ─────────────────────────────────────────────────────────────────────────────

def collect_text(xml_source: "bytes | str",
                 tags: "set[str] | None" = None) -> list[tuple[str, str]]:
    """
    Parse XML and return (tag, text) pairs for all or specific tags.

    Example:
        texts = collect_text(xml_bytes, {"title", "author"})
    """
    if isinstance(xml_source, str):
        xml_source = xml_source.encode("utf-8")
    handler = TextAccumulator(tags)
    xml.sax.parseString(xml_source, handler)
    return handler.results


def collect_attrs(xml_source: "bytes | str",
                  tag: str | None = None) -> list[ElementRecord]:
    """
    Parse XML and return ElementRecord list for all or specific tags.

    Example:
        records = collect_attrs(xml_bytes, "book")
    """
    if isinstance(xml_source, str):
        xml_source = xml_source.encode("utf-8")
    handler = AttributeCollector(tag)
    xml.sax.parseString(xml_source, handler)
    return handler.records


def validate_structure(xml_source: "bytes | str",
                        required: "list[str] | None" = None) -> StructureReport:
    """
    Validate XML structure. Returns a StructureReport.

    Example:
        report = validate_structure(xml_bytes, required=["title", "author"])
        print(report.valid, report.missing_required)
    """
    if isinstance(xml_source, str):
        xml_source = xml_source.encode("utf-8")
    val = StructureValidator(required)
    try:
        xml.sax.parseString(xml_source, val, val)
    except Exception:
        pass
    return val.report


# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    print("=== xml.sax.handler demo ===")

    sample = b"""<?xml version="1.0"?>
<catalog>
  <book id="b1" lang="en">
    <title>Python Cookbook</title>
    <author>David Beazley</author>
    <price>39.99</price>
  </book>
  <book id="b2" lang="fr">
    <title>Apprendre Python</title>
    <author>Mark Lutz</author>
    <price>29.99</price>
  </book>
  <magazine id="m1">
    <title>Python Weekly</title>
  </magazine>
</catalog>"""

    ns_xml = b"""<?xml version="1.0"?>
<catalog xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <dc:title>Namespace Demo</dc:title>
</catalog>"""

    # ── collect_text ──────────────────────────────────────────────────────
    print("\n--- collect_text (title, author) ---")
    for tag, text in collect_text(sample, {"title", "author"}):
        print(f"  {tag:10s}: {text!r}")

    # ── collect_attrs ─────────────────────────────────────────────────────
    print("\n--- collect_attrs (book) ---")
    for rec in collect_attrs(sample, "book"):
        print(f"  depth={rec.depth}  attrs={rec.attrs}")

    # ── validate_structure ────────────────────────────────────────────────
    print("\n--- validate_structure ---")
    good = validate_structure(sample, required=["title", "author"])
    bad  = validate_structure(sample, required=["title", "isbn"])
    print(f"  good: valid={good.valid}  missing={good.missing_required}"
          f"  tags={len(good.found_tags)}")
    print(f"  bad : valid={bad.valid}   missing={bad.missing_required}")

    # ── namespace handler ─────────────────────────────────────────────────
    print("\n--- NamespaceLogger ---")
    ns_log = NamespaceLogger()
    p = xml.sax.make_parser()
    p.setFeature(_handler.feature_namespaces, True)
    p.setContentHandler(ns_log)
    p.parse(io.BytesIO(ns_xml))
    print(f"  ns_map          : {ns_log.ns_map}")
    print(f"  qualified_elements: {ns_log.qualified_elements}")

    # ── handler feature / property constants ─────────────────────────────
    print("\n--- handler constants ---")
    attrs_to_show = [a for a in dir(_handler) if a.startswith("feature_") or a.startswith("property_")]
    for attr in attrs_to_show:
        print(f"  {attr:35s} = {getattr(_handler, attr)!r}")

    print("\n=== done ===")

For the xml.sax stdlib entry point — xml.sax.parseString(data, contentHandler) / xml.sax.parse(source, contentHandler) automatically create a parser and call setContentHandler(), making them the most convenient way to parse when a single ContentHandler is all that’s needed; use xml.sax.make_parser() + explicit setContentHandler()/setErrorHandler()/setFeature() calls when you need a custom ErrorHandler, namespace features, or a lexical handler. For the lxml.sax (PyPI) alternative — lxml.sax.saxify(etree_element, content_handler) generates SAX2 events from an already-parsed lxml element tree, letting you drive existing ContentHandler code from lxml-parsed documents — use lxml.sax when you need lxml’s speed and validation on the parsing side but want to consume events through a ContentHandler interface. The Claude Skills 360 bundle includes xml.sax.handler skill sets covering TextAccumulator text-collecting handler, AttributeCollector/ElementRecord attribute handler, StructureValidator/StructureReport structure checker, NamespaceLogger namespace handler, and collect_text()/collect_attrs()/validate_structure() convenience wrappers. Start with the free tier to try SAX2 handler patterns and xml.sax.handler pipeline code generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39