Blog / AI / Claude Code for email.parser: Python RFC 5322 Email Parsers

Claude Code for email.parser: Python RFC 5322 Email Parsers

Published: February 3, 2029

•

Read time: 5 min read

•

By: Claude Skills 360

Python’s email.parser module provides classes for parsing raw RFC 5322 email messages into email.message.Message or EmailMessage objects. from email.parser import BytesParser, Parser, BytesHeaderParser, HeaderParser, BytesFeedParser, FeedParser. BytesParser(policy=policy.default).parsebytes(raw) — parse bytes in one shot (preferred, production). Parser(policy=policy.default).parsestr(text) — parse str. BytesHeaderParser / HeaderParser — parse headers only, skip body (fast path for metadata). FeedParser / BytesFeedParser — push-mode streaming: fp.feed(chunk) repeatedly, then msg = fp.close(). All parsers accept an optional policy argument; always pass policy.default (or policy.SMTP) for modern EmailMessage output with typed headers — the legacy default is policy.compat32. File-based parsing: BytesParser().parse(open("msg.eml", "rb")). Defect collection: after parsing, msg.defects and per-part part.defects list any RFC violations found. Claude Code generates standards-compliant email processors, header extractors, streaming message readers, mbox file parsers, and email pipeline stages.

CLAUDE.md for email.parser

## email.parser Stack
- Stdlib: from email.parser import BytesParser, Parser
-         from email.parser import BytesHeaderParser, HeaderParser
-         from email.parser import BytesFeedParser, FeedParser
-         from email import policy
- Bytes:  msg = BytesParser(policy=policy.default).parsebytes(raw_bytes)
- Str:    msg = Parser(policy=policy.default).parsestr(raw_text)
- File:   msg = BytesParser(policy=policy.default).parse(open("x.eml","rb"))
- HdrOnly: hdr = BytesHeaderParser(policy=policy.default).parsebytes(raw)
- Stream: fp = BytesFeedParser(policy=policy.default)
-         for chunk in ...: fp.feed(chunk)
-         msg = fp.close()
- Defects: msg.defects  # list; empty if RFC-compliant

email.parser RFC 5322 Parsing Pipeline

# app/emailparserutil.py — bytes, str, header-only, streaming, defects, batch
from __future__ import annotations

import io
import os
from dataclasses import dataclass, field
from email import policy as _policy
from email.message import EmailMessage, Message
from email.parser import (
    BytesFeedParser,
    BytesHeaderParser,
    BytesParser,
    FeedParser,
    HeaderParser,
    Parser,
)
from typing import Any, Iterator


# ─────────────────────────────────────────────────────────────────────────────
# 1. One-shot parsing helpers
# ─────────────────────────────────────────────────────────────────────────────

def parse_bytes(raw: bytes,
                pol: Any = _policy.default) -> EmailMessage:
    """
    Parse raw email bytes into an EmailMessage.

    Example:
        msg = parse_bytes(b"From: [email protected]\r\nSubject: Hi\r\n\r\nBody")
        print(msg["Subject"])
    """
    return BytesParser(policy=pol).parsebytes(raw)   # type: ignore[return-value]


def parse_str(text: str,
              pol: Any = _policy.default) -> EmailMessage:
    """
    Parse a raw email string into an EmailMessage.

    Example:
        msg = parse_str("From: [email protected]\r\nSubject: Hi\r\n\r\nBody")
    """
    return Parser(policy=pol).parsestr(text)   # type: ignore[return-value]


def parse_file(path: str,
               pol: Any = _policy.default) -> EmailMessage:
    """
    Parse an .eml file from disk.

    Example:
        msg = parse_file("/tmp/message.eml")
        print(msg["From"])
    """
    with open(path, "rb") as fp:
        return BytesParser(policy=pol).parse(fp)   # type: ignore[return-value]


# ─────────────────────────────────────────────────────────────────────────────
# 2. Header-only parsers (fast path)
# ─────────────────────────────────────────────────────────────────────────────

def parse_headers_only(raw: "bytes | str",
                       pol: Any = _policy.default) -> EmailMessage:
    """
    Parse only the headers of a message; body is not processed.
    Much faster than a full parse for large messages.

    Example:
        hdr = parse_headers_only(raw_bytes)
        print(hdr["Subject"], hdr["From"])
    """
    if isinstance(raw, bytes):
        return BytesHeaderParser(policy=pol).parsebytes(raw)   # type: ignore[return-value]
    return HeaderParser(policy=pol).parsestr(raw)   # type: ignore[return-value]


def quick_subject(raw: "bytes | str") -> str:
    """
    Extract Subject without fully parsing the message body.

    Example:
        subj = quick_subject(raw_bytes)
    """
    return parse_headers_only(raw).get("Subject", "")


def quick_from(raw: "bytes | str") -> str:
    """
    Extract the From header value without parsing the body.

    Example:
        sender = quick_from(raw_bytes)
    """
    return str(parse_headers_only(raw).get("From", ""))


# ─────────────────────────────────────────────────────────────────────────────
# 3. Streaming / incremental parser
# ─────────────────────────────────────────────────────────────────────────────

def parse_stream(stream: "io.RawIOBase | io.BufferedIOBase",
                 chunk_size: int = 65536,
                 pol: Any = _policy.default) -> EmailMessage:
    """
    Parse a message from a binary stream in chunks using BytesFeedParser.
    Suitable for network sockets or large file streams.

    Example:
        with open("big.eml", "rb") as f:
            msg = parse_stream(f, chunk_size=16384)
    """
    fp = BytesFeedParser(policy=pol)
    while True:
        chunk = stream.read(chunk_size)
        if not chunk:
            break
        fp.feed(chunk)
    return fp.close()   # type: ignore[return-value]


def parse_chunks(chunks: "Iterator[bytes]",
                 pol: Any = _policy.default) -> EmailMessage:
    """
    Parse a message from an iterator of byte chunks.

    Example:
        msg = parse_chunks(iter([b"Subject: Hi\r\n", b"\r\n", b"body"]))
    """
    fp = BytesFeedParser(policy=pol)
    for chunk in chunks:
        fp.feed(chunk)
    return fp.close()   # type: ignore[return-value]


# ─────────────────────────────────────────────────────────────────────────────
# 4. Defect collection
# ─────────────────────────────────────────────────────────────────────────────

@dataclass
class ParseReport:
    ok:      bool
    defects: list[str] = field(default_factory=list)
    msg:     Any = None


def parse_with_report(raw: "bytes | str",
                      pol: Any = _policy.default) -> ParseReport:
    """
    Parse a message and collect RFC defects from all parts.

    Example:
        report = parse_with_report(raw_bytes)
        if not report.ok:
            for d in report.defects:
                print("DEFECT:", d)
    """
    lax = _policy.default.clone(raise_on_defect=False) if pol is _policy.default else pol
    try:
        if isinstance(raw, bytes):
            msg = BytesParser(policy=lax).parsebytes(raw)
        else:
            msg = Parser(policy=lax).parsestr(raw)
    except Exception as e:
        return ParseReport(ok=False, defects=[f"fatal: {e}"])

    defects: list[str] = [str(d) for d in msg.defects]
    for part in msg.walk():
        defects.extend(str(d) for d in part.defects)

    return ParseReport(ok=len(defects) == 0, defects=defects, msg=msg)


# ─────────────────────────────────────────────────────────────────────────────
# 5. Batch directory parser
# ─────────────────────────────────────────────────────────────────────────────

@dataclass
class BatchResult:
    path:    str
    ok:      bool
    subject: str
    from_:   str
    defects: list[str] = field(default_factory=list)
    error:   str = ""


def parse_directory(directory: str,
                    pol: Any = _policy.default,
                    headers_only: bool = False) -> list[BatchResult]:
    """
    Parse all .eml files in a directory.  Returns one BatchResult per file.
    headers_only=True uses the fast BytesHeaderParser path.

    Example:
        results = parse_directory("/var/mail/inbox")
        for r in results:
            print(r.subject, r.from_, r.ok)
    """
    results: list[BatchResult] = []
    for fname in sorted(os.listdir(directory)):
        if not fname.lower().endswith(".eml"):
            continue
        fpath = os.path.join(directory, fname)
        try:
            with open(fpath, "rb") as f:
                raw = f.read()
            if headers_only:
                msg = BytesHeaderParser(policy=pol).parsebytes(raw)
                defects: list[str] = []
            else:
                report = parse_with_report(raw, pol)
                msg = report.msg
                defects = report.defects
            results.append(BatchResult(
                path=fpath,
                ok=len(defects) == 0,
                subject=msg.get("Subject", ""),
                from_=str(msg.get("From", "")),
                defects=defects,
            ))
        except Exception as e:
            results.append(BatchResult(
                path=fpath, ok=False, subject="", from_="", error=str(e)
            ))
    return results


# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    print("=== email.parser demo ===")

    clean = (
        b"From: Alice <[email protected]>\r\n"
        b"To: Bob <[email protected]>\r\n"
        b"Subject: Parser demo\r\n"
        b"Date: Mon, 03 Feb 2029 09:00:00 +0000\r\n"
        b"Message-ID: <[email protected]>\r\n"
        b"Content-Type: text/plain; charset=utf-8\r\n"
        b"\r\n"
        b"Hello from email.parser!\r\n"
    )

    defective = (
        b"From: notavalidemail\r\n"
        b"To:\r\n"
        b"Subject:\r\n"
        b"\r\n"
        b"sparse body"
    )

    # ── parse_bytes ────────────────────────────────────────────────────────
    print("\n--- parse_bytes ---")
    msg = parse_bytes(clean)
    print(f"  Subject : {msg['Subject']!r}")
    print(f"  From    : {msg['From']!r}")
    print(f"  body    : {msg.get_payload()!r}")

    # ── parse_headers_only ─────────────────────────────────────────────────
    print("\n--- parse_headers_only ---")
    hdr = parse_headers_only(clean)
    print(f"  Subject : {hdr['Subject']!r}")
    print(f"  has_body: {bool(hdr.get_payload())}")

    # ── parse_stream ───────────────────────────────────────────────────────
    print("\n--- parse_stream ---")
    stream_msg = parse_stream(io.BytesIO(clean), chunk_size=50)
    print(f"  Subject : {stream_msg['Subject']!r}")

    # ── parse_chunks ───────────────────────────────────────────────────────
    print("\n--- parse_chunks ---")
    chunks = [clean[i:i+40] for i in range(0, len(clean), 40)]
    chunked_msg = parse_chunks(iter(chunks))
    print(f"  Subject : {chunked_msg['Subject']!r}")
    print(f"  chunks  : {len(chunks)}")

    # ── parse_with_report (clean) ──────────────────────────────────────────
    print("\n--- parse_with_report (clean) ---")
    report_clean = parse_with_report(clean)
    print(f"  ok      : {report_clean.ok}")
    print(f"  defects : {report_clean.defects}")

    # ── parse_with_report (defective) ─────────────────────────────────────
    print("\n--- parse_with_report (defective) ---")
    report_bad = parse_with_report(defective)
    print(f"  ok      : {report_bad.ok}")
    for d in report_bad.defects:
        print(f"  defect  : {d}")

    # ── quick_subject / quick_from ─────────────────────────────────────────
    print("\n--- quick_subject / quick_from ---")
    print(f"  subject : {quick_subject(clean)!r}")
    print(f"  from    : {quick_from(clean)!r}")

    print("\n=== done ===")

For the email.message.EmailMessage stdlib companion — when parsed with policy.default, BytesParser returns an EmailMessage whose headers are typed objects (msg["From"].addresses gives Address instances); always pass policy=email.policy.default to the parser rather than relying on the legacy compat32 default to get structured headers, defect detection, and modern get_content()/iter_attachments() access. For the mail-parser (PyPI) alternative — mailparser.parse_from_bytes(raw) provides a higher-level parsed object with .from_, .to, .subject, .date, .attachments, .body, .text_plain, and .text_html fields that map directly to common email fields without navigating the email.message MIME tree — use mail-parser for rapid extraction of common fields in data pipelines; use stdlib email.parser for full RFC 5322 control, policy customisation, and zero-dependency deployments. The Claude Skills 360 bundle includes email.parser skill sets covering parse_bytes()/parse_str()/parse_file() one-shot parsers, parse_headers_only()/quick_subject()/quick_from() fast header extractors, parse_stream()/parse_chunks() streaming parsers, ParseReport/parse_with_report() defect collector, and BatchResult/parse_directory() bulk .eml processor. Start with the free tier to try RFC 5322 parsing patterns and email.parser pipeline code generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39