Blog / AI / Claude Code for xml.sax.saxutils: Python SAX XML Utility Functions

Claude Code for xml.sax.saxutils: Python SAX XML Utility Functions

Published: February 8, 2029

•

Read time: 5 min read

•

By: Claude Skills 360

Python’s xml.sax.saxutils module provides utility functions and base classes for SAX-based XML processing. from xml.sax import saxutils. Three key pieces: Escape/unescape: saxutils.escape(data, entities={}) — escapes &, <, > in text content; entities dict adds extra replacements. saxutils.unescape(data, entities={}) — reverses the escape; entities adds custom entity mappings beside the defaults &, <, >. saxutils.quoteattr(data, entities={}) — escapes text for use as an XML attribute value, wrapping in " or ' automatically to minimise escaping. XMLGenerator: saxutils.XMLGenerator(out=None, encoding="iso-8859-1", short_empty_elements=False) — a ContentHandler that writes SAX events as XML to file out; call startElement(name, attrs), characters(content), endElement(name) to build well-formed XML. XMLFilterBase: saxutils.XMLFilterBase(parent=None) — a ContentHandler + XMLReader that by default passes all events to parent; subclass and override individual event methods to transform, filter, or instrument a SAX pipeline. Claude Code generates XML escaping utilities, SAX event loggers, element-stripping filters, attribute-rewriting filters, namespace-injecting filters, and XML transformation pipelines.

CLAUDE.md for xml.sax.saxutils

## xml.sax.saxutils Stack
- Stdlib: from xml.sax import saxutils, parse, parseString
-         from xml.sax.handler import ContentHandler
- Escape: saxutils.escape(text)             # & < >
-         saxutils.escape(text, {"'": "&apos;", '"': "&quot;"})
-         saxutils.unescape(text)            # reverse
-         saxutils.quoteattr(value)          # for attributes
- Write:  gen = saxutils.XMLGenerator(out, encoding="utf-8", short_empty_elements=True)
-         gen.startDocument()
-         gen.startElement("tag", attrs_impl)
-         gen.characters("text")
-         gen.endElement("tag")
- Filter: class MyFilter(saxutils.XMLFilterBase):
-             def startElement(self, name, attrs): ...
-             super().startElement(name, attrs)

xml.sax.saxutils XML Processing Pipeline

# app/xmlsaxutilsutil.py — escape, generate, filter, transform, log, strip
from __future__ import annotations

import io
import xml.sax
import xml.sax.handler
from xml.sax import saxutils
from xml.sax.xmlreader import AttributesImpl
from typing import Any


# ─────────────────────────────────────────────────────────────────────────────
# 1. Escape / unescape helpers
# ─────────────────────────────────────────────────────────────────────────────

def xml_escape(text: str,
               full: bool = False) -> str:
    """
    Escape text for XML element content.
    full=True also escapes single and double quotes (for attribute context).

    Example:
        xml_escape("1 < 2 & 3 > 4")     # "1 &lt; 2 &amp; 3 &gt; 4"
        xml_escape("<b>\"hello\"</b>", full=True)
    """
    extras: dict[str, str] = {"'": "&apos;", '"': "&quot;"} if full else {}
    return saxutils.escape(text, extras)


def xml_unescape(text: str,
                 extras: "dict[str, str] | None" = None) -> str:
    """
    Unescape XML entity references in text content.
    extras: additional { "&entity;": "char" } mappings.

    Example:
        xml_unescape("cats &amp; dogs &lt; 42")   # "cats & dogs < 42"
    """
    return saxutils.unescape(text, extras or {})


def xml_attr(value: str) -> str:
    """
    Return the value quoted and escaped for use as an XML attribute.
    Chooses quote char automatically.

    Example:
        xml_attr('He said "hello"')    # "'He said \"hello\"'"
        xml_attr("it's alive")         # '"it\'s alive"'
    """
    return saxutils.quoteattr(value)


# ─────────────────────────────────────────────────────────────────────────────
# 2. XMLGenerator-based document builder
# ─────────────────────────────────────────────────────────────────────────────

class SimpleXMLWriter:
    """
    Context-manager XML writer on top of XMLGenerator.
    Tracks nesting depth and provides convenience methods.

    Example:
        out = io.StringIO()
        with SimpleXMLWriter(out) as w:
            with w.element("catalog", {"version": "1"}):
                with w.element("book", {"id": "1"}):
                    w.text_element("title", "Python Cookbook")
                    w.text_element("price", "39.99")
        print(out.getvalue())
    """

    def __init__(self, out: "io.IOBase | io.StringIO | io.BytesIO",
                 encoding: str = "utf-8",
                 short_empty: bool = True) -> None:
        self._gen = saxutils.XMLGenerator(out, encoding=encoding,
                                           short_empty_elements=short_empty)
        self._gen.startDocument()

    def start(self, tag: str,
              attrs: "dict[str, str] | None" = None) -> None:
        impl = AttributesImpl(attrs or {})
        self._gen.startElement(tag, impl)

    def end(self, tag: str) -> None:
        self._gen.endElement(tag)

    def characters(self, text: str) -> None:
        self._gen.characters(text)

    def text_element(self, tag: str, text: str,
                     attrs: "dict[str, str] | None" = None) -> None:
        """Convenience: open tag, write text, close tag."""
        self.start(tag, attrs)
        self.characters(text)
        self.end(tag)

    def empty(self, tag: str,
              attrs: "dict[str, str] | None" = None) -> None:
        """Write a self-closing element."""
        self.start(tag, attrs)
        self.end(tag)

    def element(self, tag: str,
                attrs: "dict[str, str] | None" = None) -> "_ElementContext":
        """Context manager for a nested element."""
        return _ElementContext(self, tag, attrs)

    def close(self) -> None:
        pass   # XMLGenerator writes eagerly; nothing to flush


class _ElementContext:
    def __init__(self, writer: SimpleXMLWriter,
                 tag: str, attrs: "dict[str, str] | None") -> None:
        self._w = writer
        self._tag = tag
        self._attrs = attrs

    def __enter__(self) -> "_ElementContext":
        self._w.start(self._tag, self._attrs)
        return self

    def __exit__(self, *_: Any) -> None:
        self._w.end(self._tag)


def build_xml(root_tag: str,
              children: "list[tuple[str, dict, str]]",
              root_attrs: "dict[str, str] | None" = None) -> str:
    """
    Build a simple XML document.
    children: list of (tag, attrs_dict, text_content).

    Example:
        xml = build_xml("catalog", [
            ("book", {"id": "1"}, "Python Cookbook"),
            ("book", {"id": "2"}, "Learning Python"),
        ], {"version": "1"})
    """
    out = io.StringIO()
    with SimpleXMLWriter(out) as w:
        with w.element(root_tag, root_attrs):
            for tag, attrs, text in children:
                w.text_element(tag, text, attrs)
    return out.getvalue()


# ─────────────────────────────────────────────────────────────────────────────
# 3. XMLFilterBase: element-stripping filter
# ─────────────────────────────────────────────────────────────────────────────

class StripElementsFilter(saxutils.XMLFilterBase):
    """
    SAX filter that removes (strips) elements with given tag names and their content.

    Example:
        out = io.StringIO()
        gen = saxutils.XMLGenerator(out, "utf-8")
        filt = StripElementsFilter(gen, {"price", "secret"})
        xml.sax.parseString(xml_bytes, filt)
        print(out.getvalue())
    """

    def __init__(self, downstream: xml.sax.handler.ContentHandler,
                 strip_tags: "set[str]") -> None:
        super().__init__()
        self._down = downstream
        self._strip = strip_tags
        self._depth = 0   # nesting depth inside a stripped element

    def startDocument(self) -> None:
        self._down.startDocument()

    def endDocument(self) -> None:
        self._down.endDocument()

    def startElement(self, name: str, attrs: Any) -> None:
        if self._depth > 0 or name in self._strip:
            self._depth += 1
        else:
            self._down.startElement(name, attrs)

    def endElement(self, name: str) -> None:
        if self._depth > 0:
            self._depth -= 1
        else:
            self._down.endElement(name)

    def characters(self, content: str) -> None:
        if self._depth == 0:
            self._down.characters(content)

    def ignorableWhitespace(self, whitespace: str) -> None:
        if self._depth == 0:
            self._down.ignorableWhitespace(whitespace)


# ─────────────────────────────────────────────────────────────────────────────
# 4. XMLFilterBase: attribute-rewriting filter
# ─────────────────────────────────────────────────────────────────────────────

class RenameAttributeFilter(saxutils.XMLFilterBase):
    """
    SAX filter that renames attribute keys across all elements.
    rename: { "old_attr": "new_attr" }

    Example:
        out = io.StringIO()
        gen = saxutils.XMLGenerator(out, "utf-8")
        filt = RenameAttributeFilter(gen, {"class": "css_class"})
        xml.sax.parseString(xml_bytes, filt)
    """

    def __init__(self, downstream: xml.sax.handler.ContentHandler,
                 rename: "dict[str, str]") -> None:
        super().__init__()
        self._down = downstream
        self._rename = rename

    def startDocument(self) -> None:
        self._down.startDocument()

    def endDocument(self) -> None:
        self._down.endDocument()

    def startElement(self, name: str, attrs: Any) -> None:
        new_attrs = {self._rename.get(k, k): v
                     for k, v in attrs.items()}
        self._down.startElement(name, AttributesImpl(new_attrs))

    def endElement(self, name: str) -> None:
        self._down.endElement(name)

    def characters(self, content: str) -> None:
        self._down.characters(content)


def apply_filter(xml_source: "bytes | str",
                 strip_tags: "set[str] | None" = None,
                 rename_attrs: "dict[str, str] | None" = None) -> str:
    """
    Apply strip and/or attribute-rename filters to XML.
    Returns the filtered XML string.

    Example:
        result = apply_filter(xml_bytes, strip_tags={"price", "notes"})
    """
    if isinstance(xml_source, str):
        xml_source = xml_source.encode("utf-8")
    out = io.StringIO()
    gen = saxutils.XMLGenerator(out, "utf-8", short_empty_elements=True)
    handler: xml.sax.handler.ContentHandler = gen

    if rename_attrs:
        handler = RenameAttributeFilter(handler, rename_attrs)  # type: ignore[assignment]
    if strip_tags:
        handler = StripElementsFilter(handler, strip_tags)      # type: ignore[assignment]

    xml.sax.parseString(xml_source, handler)   # type: ignore[arg-type]
    return out.getvalue()


# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    print("=== xml.sax.saxutils demo ===")

    # ── escape / unescape ─────────────────────────────────────────────────
    print("\n--- escape / unescape ---")
    raw_texts = [
        "1 < 2 & 3 > 4",
        'He said "hello" & it\'s fine',
        "<script>alert('xss')</script>",
    ]
    for t in raw_texts:
        esc = xml_escape(t)
        unesc = xml_unescape(esc)
        print(f"  orig   : {t!r}")
        print(f"  escaped: {esc!r}")
        print(f"  round  : {unesc == t}")

    # ── quoteattr ─────────────────────────────────────────────────────────
    print("\n--- xml_attr ---")
    for val in ['simple', "it's got apostrophe", 'has "quotes"', "both ' and \"quotes\"'"]:
        print(f"  {val!r:35s} → {xml_attr(val)}")

    # ── SimpleXMLWriter ───────────────────────────────────────────────────
    print("\n--- SimpleXMLWriter ---")
    xml_out = build_xml(
        "catalog", [
            ("book", {"id": "b1", "lang": "en"}, "Python Cookbook"),
            ("book", {"id": "b2", "lang": "fr"}, "Apprendre Python"),
            ("magazine", {"id": "m1"}, "Python Weekly"),
        ],
        {"version": "1.0"}
    )
    print(xml_out[:300])

    # ── apply_filter (strip + rename) ─────────────────────────────────────
    print("\n--- apply_filter ---")
    source_xml = (
        b'<?xml version="1.0"?>'
        b'<catalog>'
        b'<book id="b1" class="fiction"><title>Python</title><price>39.99</price></book>'
        b'<book id="b2" class="tech"><title>Django</title><price>29.99</price></book>'
        b'</catalog>'
    )
    filtered = apply_filter(source_xml,
                             strip_tags={"price"},
                             rename_attrs={"class": "category"})
    print(filtered[:400])

    print("\n=== done ===")

For the html stdlib companion — html.escape(s) escapes only the 5 critical HTML/XML characters (&, <, >, ", ') and is the right choice for HTML output; saxutils.escape() by default handles only &, <, > (without escaping quotes) which is correct for XML element text content but not for attribute values — always use saxutils.quoteattr() for attribute values and saxutils.escape() only for text nodes. For the defusedxml (PyPI) alternative — defusedxml.sax.parseString(data, handler) is a drop-in replacement for xml.sax.parseString that mitigates XML bomb and XXE attacks by rejecting billion-laughs expansions and external entity references — use defusedxml whenever parsing untrusted XML; use stdlib xml.sax/saxutils only for trusted input, and always explicitly disable DTD features. The Claude Skills 360 bundle includes xml.sax.saxutils skill sets covering xml_escape()/xml_unescape()/xml_attr() escape helpers, SimpleXMLWriter/build_xml() XMLGenerator-based document builder, StripElementsFilter element-removal filter, RenameAttributeFilter attribute-rewriting filter, and apply_filter() composable filter pipeline. Start with the free tier to try SAX utility patterns and xml.sax.saxutils pipeline code generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39