Blog / AI / Claude Code for pyparsing: Python Parser Construction

Claude Code for pyparsing: Python Parser Construction

Published: February 16, 2028

•

Read time: 5 min read

•

By: Claude Skills 360

pyparsing builds text parsers from composable primitives. pip install pyparsing. Tokens: from pyparsing import Word, Keyword, Literal, Regex, Suppress, alphas, alphanums, nums. Word(alphas) — one or more alpha chars. Word(alphanums + "_") — identifier. Keyword("if") — whole-word match. Suppress("[") — match but discard. Regex(r"\d+\.\d*") — regex token. QuotedString('"') — double-quoted string. pyparsing_common.integer — int literal. pyparsing_common.real — float. Composition: And([a, b]) = a + b. Or([a, b]) = a | b. ZeroOrMore(expr). OneOrMore(expr). Optional(expr). Group(expr) — wrap in sub-list. Suppress(literal) — parse, discard. Forward() — recursive grammar. nestedExpr("(",")"). Results: result = expr.parseString(text). result[0] index. result["name"] named. result.asDict(). result.asList(). Names: expr.setResultsName("key") or expr("key"). Scan: for tokens, start, end in expr.scanString(text):. Transform: expr.transformString(text). Packrat: ParserElement.enablePackrat() — memoize repeated sub-expressions. Actions: expr.setParseAction(fn) — transform matched tokens. expr.addCondition(fn) — filter. Combine(a + b) — merge adjacent. commaSeparatedList shorthand. Claude Code generates pyparsing grammars, expression parsers, and config DSL readers.

CLAUDE.md for pyparsing

## pyparsing Stack
- Version: pyparsing >= 3.1 | pip install pyparsing
- Tokens: Word(alphas) | Keyword("if") | Regex(r"...") | QuotedString('"')
- Compose: a + b (And) | a | b (Or) | ZeroOrMore(x) | Optional(x) | Group(x)
- Names: expr("key") or expr.setResultsName("key") — access as result["key"]
- Results: parseString(text)[0] | .asDict() | .asList()
- Scan: for tokens, s, e in expr.scanString(text): — find all matches
- Perf: ParserElement.enablePackrat() before complex grammars

pyparsing Parser Construction Pipeline

# app/parsers.py — pyparsing grammars for expressions, config, and DSL parsing
from __future__ import annotations

import operator
from typing import Any

from pyparsing import (
    CaselessKeyword,
    Combine,
    Forward,
    Group,
    Keyword,
    Literal,
    OneOrMore,
    OpAssoc,
    Optional,
    ParserElement,
    ParseResults,
    QuotedString,
    Regex,
    Suppress,
    Word,
    ZeroOrMore,
    alphanums,
    alphas,
    infixNotation,
    nums,
    printables,
    pyparsing_common,
    rest_of_line,
)

# Enable memoization — dramatically speeds up complex grammars with backtracking
ParserElement.enablePackrat()

LPAR, RPAR = Suppress("("), Suppress(")")
LBRACKET, RBRACKET = Suppress("["), Suppress("]")
LBRACE, RBRACE = Suppress("{"), Suppress("}")
COMMA  = Suppress(",")
EQUALS = Suppress("=")
COLON  = Suppress(":")
SEMI   = Suppress(";")


# ─────────────────────────────────────────────────────────────────────────────
# 1. Arithmetic expression parser with operator precedence
# ─────────────────────────────────────────────────────────────────────────────

def _make_arith_parser():
    """
    infixNotation builds operator-precedence parsers automatically.
    Each tuple: (operator_expr, arity, associativity, parse_action).
    """
    integer = pyparsing_common.integer
    real    = pyparsing_common.real
    number  = (real | integer)("number")

    ident   = Word(alphas + "_", alphanums + "_")("name")
    atom    = number | ident | (LPAR + Forward() + RPAR)

    # infixNotation handles left/right associativity and precedence automatically
    expr = infixNotation(
        atom,
        [
            (Literal("**"),              2, OpAssoc.RIGHT),   # power (right-assoc)
            (Literal("-"),               1, OpAssoc.RIGHT),   # unary minus
            (Literal("*") | Literal("/") | Literal("%"),
                                         2, OpAssoc.LEFT),
            (Literal("+") | Literal("-"),2, OpAssoc.LEFT),
        ],
    )
    return expr


_arith_parser = _make_arith_parser()


def parse_expression(text: str) -> ParseResults:
    """Parse an arithmetic expression: '2 * (x + 3) ** 2'."""
    return _arith_parser.parseString(text, parseAll=True)


# ─────────────────────────────────────────────────────────────────────────────
# 2. INI / Key=value config file parser
# ─────────────────────────────────────────────────────────────────────────────

def make_config_parser():
    """
    Parses INI-style config:
        [section]
        key = value
        key2 = "quoted value"
        # comment
    Returns dict of {section: {key: value}}.
    """
    comment = Suppress("#" + rest_of_line)

    identifier = Word(alphas + "_", alphanums + "_-.")
    value_str  = QuotedString('"') | QuotedString("'") | Regex(r"[^\n#]+").leaveWhitespace().stripWhitespace()
    integer    = pyparsing_common.integer
    real       = pyparsing_common.real
    boolean    = (CaselessKeyword("true") | CaselessKeyword("false")).setParseAction(
        lambda t: t[0].lower() == "true"
    )
    value = boolean | real | integer | value_str

    key_value = Group(identifier("key") + EQUALS + value("value"))

    section_head = Suppress("[") + identifier("name") + Suppress("]")
    section = Group(section_head + Group(ZeroOrMore(key_value | comment))("items"))

    config_file = ZeroOrMore(comment | section)
    return config_file


def parse_config(text: str) -> dict[str, dict[str, Any]]:
    """Parse an INI-style string into a nested dict."""
    parser = make_config_parser()
    result = parser.parseString(text)
    out: dict[str, dict[str, Any]] = {}
    for section in result:
        name = section[0]
        items = {kv["key"]: kv["value"] for kv in section[1]}
        out[name] = items
    return out


# ─────────────────────────────────────────────────────────────────────────────
# 3. Simple SQL SELECT parser
# ─────────────────────────────────────────────────────────────────────────────

def make_select_parser():
    """
    Parses: SELECT col1, col2 FROM table WHERE col = 'val' LIMIT 100
    Demonstrates Keyword for reserved words and Group for column lists.
    """
    SELECT = Keyword("SELECT", caseless=True)
    FROM   = Keyword("FROM",   caseless=True)
    WHERE  = Keyword("WHERE",  caseless=True)
    LIMIT  = Keyword("LIMIT",  caseless=True)
    AND    = Keyword("AND",    caseless=True)
    OR     = Keyword("OR",     caseless=True)
    AS_KW  = Keyword("AS",     caseless=True)
    STAR   = Literal("*")

    identifier = Word(alphas + "_", alphanums + "_")
    table_name = identifier
    column     = Combine(identifier + Optional("." + identifier)) | STAR
    alias      = Group(column + Optional(Suppress(AS_KW) + identifier("alias")))

    column_list = Group(alias + ZeroOrMore(COMMA + alias))("columns")

    string_val  = QuotedString("'") | QuotedString('"')
    number_val  = pyparsing_common.real | pyparsing_common.integer
    value       = string_val | number_val | identifier
    op          = Regex(r"[!=<>]+")
    condition   = Group(identifier("col") + op("op") + value("val"))
    where_expr  = condition + ZeroOrMore((AND | OR) + condition)

    select_stmt = (
        SELECT
        + column_list
        + Suppress(FROM)
        + table_name("table")
        + Optional(Suppress(WHERE) + Group(where_expr)("where"))
        + Optional(Suppress(LIMIT) + pyparsing_common.integer("limit"))
    )
    return select_stmt


def parse_select(sql: str) -> dict[str, Any]:
    parser = make_select_parser()
    result = parser.parseString(sql.strip(), parseAll=True)
    return {
        "table":   result.get("table", ""),
        "columns": result.get("columns", []).asList(),
        "where":   result.get("where", []).asList(),
        "limit":   result.get("limit"),
    }


# ─────────────────────────────────────────────────────────────────────────────
# 4. Log line extractor (scanString)
# ─────────────────────────────────────────────────────────────────────────────

def make_log_parser():
    """
    Extract structured fields from log lines like:
      2024-01-05 14:30:22 ERROR app.db Connection timeout after 30s
    scanString finds all matches in a multi-line string.
    """
    date   = Combine(Word(nums, exact=4) + "-" + Word(nums, exact=2) + "-" + Word(nums, exact=2))
    time_  = Combine(Word(nums, exact=2) + ":" + Word(nums, exact=2) + ":" + Word(nums, exact=2))
    level  = (Keyword("DEBUG") | Keyword("INFO") | Keyword("WARNING") | Keyword("ERROR") | Keyword("CRITICAL"))
    logger = Combine(Word(alphanums + "_") + ZeroOrMore("." + Word(alphanums + "_")))
    msg    = Regex(r".+")

    log_line = (
        date("date") + time_("time") + level("level")
        + logger("logger") + msg("message")
    )
    return log_line


def extract_log_entries(log_text: str) -> list[dict[str, str]]:
    """Extract all parseable log entries from multi-line output."""
    parser = make_log_parser()
    entries = []
    for tokens, start, end in parser.scanString(log_text):
        entries.append({
            "date":    tokens.get("date", ""),
            "time":    tokens.get("time", ""),
            "level":   tokens.get("level", ""),
            "logger":  tokens.get("logger", ""),
            "message": tokens.get("message", ""),
        })
    return entries


# ─────────────────────────────────────────────────────────────────────────────
# 5. Version string parser
# ─────────────────────────────────────────────────────────────────────────────

def make_version_parser():
    """
    Parse semantic version strings: 1.2.3, 1.2.3-beta.1, 1.2.3+build.42
    Demonstrates Combine for concatenated tokens and Optional for suffixes.
    """
    integer  = Word(nums)
    dot      = Literal(".")
    pre_id   = Combine(Word(alphanums + "."))
    build_id = Combine(Word(alphanums + "."))

    version = (
        integer("major") + Suppress(dot)
        + integer("minor") + Suppress(dot)
        + integer("patch")
        + Optional(Suppress("-") + pre_id("prerelease"))
        + Optional(Suppress("+") + build_id("build"))
    )
    return version


def parse_version(v: str) -> dict[str, Any]:
    parser = make_version_parser()
    result = parser.parseString(v.strip(), parseAll=True)
    return {
        "major":      int(result["major"]),
        "minor":      int(result["minor"]),
        "patch":      int(result["patch"]),
        "prerelease": result.get("prerelease"),
        "build":      result.get("build"),
    }


# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────

CONFIG_SAMPLE = """
[database]
host = localhost
port = 5432
name = "myapp_db"
ssl  = true

[server]
port    = 8000
debug   = false
workers = 4
"""

LOG_SAMPLE = """
2024-01-05 14:30:22 INFO  app.server Server started on port 8000
2024-01-05 14:31:05 ERROR app.db    Connection timeout after 30s
2024-01-05 14:31:06 WARNING app.cache Cache miss rate 45%
"""

if __name__ == "__main__":
    print("=== Arithmetic expression ===")
    r = parse_expression("2 * (x + 3) ** 2")
    print(f"  {r.asList()}")

    print("\n=== Config parser ===")
    cfg = parse_config(CONFIG_SAMPLE)
    for section, vals in cfg.items():
        print(f"  [{section}]")
        for k, v in vals.items():
            print(f"    {k} = {v!r}")

    print("\n=== SELECT parser ===")
    queries = [
        "SELECT id, name FROM users WHERE active = 1 LIMIT 10",
        "SELECT * FROM products WHERE category = 'books'",
        "SELECT user_id, count FROM stats",
    ]
    for sql in queries:
        r = parse_select(sql)
        print(f"  table={r['table']}  cols={r['columns']}  limit={r['limit']}")

    print("\n=== Log extraction ===")
    entries = extract_log_entries(LOG_SAMPLE)
    for e in entries:
        print(f"  [{e['level']:8}] {e['logger']:15} {e['message']}")

    print("\n=== Version parser ===")
    for v in ["1.2.3", "2.0.0-beta.1", "1.0.0+build.42", "3.1.4-rc.2+sha.abc123"]:
        p = parse_version(v)
        print(f"  {v:25} → {p}")

For the re (regex) alternative — Python’s re module is the right tool for fixed-pattern extraction from homogeneous text, but composing a grammar from regex primitives for hierarchical structures (nested brackets, operator precedence, recursive rules) requires writing a manual recursive-descent parser; pyparsing’s infixNotation() builds a correct operator-precedence grammar in 8 lines, nestedExpr("(",")") handles arbitrarily nested parentheses, and Group() / named results (expr("key")) give you a structured parse tree instead of raw string matches. For the lark alternative — lark uses a formal BNF or EBNF grammar string that you write as a separate text block and parses it with Earley or LALR algorithm, which handles ambiguous or left-recursive grammars and is faster for large inputs; pyparsing builds the grammar in pure Python using operator overloading (a + b, a | b) so the grammar lives in the same file as the parse actions, making it easier to prototype and debug smaller parsers and DSLs without context-switching to a separate grammar file. The Claude Skills 360 bundle includes pyparsing skill sets covering Word/Keyword/Regex/QuotedString primitives, And/Or/ZeroOrMore/OneOrMore/Optional composition, Group and Suppress for structure, setResultsName and (“key”) syntax, infixNotation for operator-precedence grammars, Forward for recursive rules, scanString for multi-match extraction, parseString with parseAll, parse_config INI file reader, parse_select SQL SELECT parser, extract_log_entries log scanner, parse_version semantic version parser, and enablePackrat memoization. Start with the free tier to try parser grammar code generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39