Blog / AI / Claude Code for Lark: Python EBNF Parser Generator

Claude Code for Lark: Python EBNF Parser Generator

Published: February 17, 2028

•

Read time: 5 min read

•

By: Claude Skills 360

Lark parses text using EBNF grammars and generates parse trees. pip install lark. Grammar: from lark import Lark, Transformer, Tree, Token. Grammar string: rule names lowercase, terminals uppercase. parser = Lark(grammar, parser="earley") or parser="lalr". LALR is faster, Earley handles ambiguity. Parse: tree = parser.parse(text). tree.pretty() — printable tree. tree.data — rule name. tree.children — list of subtrees and tokens. Transformer: subclass Transformer, method name = rule name, receives list of children. @v_args(inline=True) — pass children as positional args. Discard — return to drop a child. Token — terminal value, has .type and .value. Common patterns: ?rule in grammar — transparent (inline children). !keyword — keep token. terminal: /regex/. rule: a b c — sequence. rule: a | b — alternation. rule: a* — zero or more. rule: a+ — one or more. rule: a? — optional. %import common.WS and %ignore WS — skip whitespace. %import common.NUMBER. %import common.CNAME — C-style identifier. %import common.ESCAPED_STRING. Earley: ambiguous grammars, slower, ambiguity="resolve". LALR: fast, deterministic, use start="rule_name". Dynamic lexer: lexer="dynamic". Claude Code generates Lark grammars, Transformer subclasses, and LALR expression parsers.

CLAUDE.md for Lark

## Lark Stack
- Version: lark >= 1.2 | pip install lark
- Grammar: EBNF string — lowercase rules, UPPERCASE terminals, ? for transparent rules
- Parser: Lark(grammar, parser="lalr") for speed | "earley" for ambiguous grammars
- Parse: tree = parser.parse(text) | tree.pretty() for debug | tree.data for rule name
- Transform: Transformer subclass — method name = rule name, returns value
- Inline: @v_args(inline=True) on Transformer methods — children as positional args
- Import: %import common.WS / NUMBER / CNAME / ESCAPED_STRING | %ignore WS

Lark EBNF Parser Pipeline

# app/lark_parsers.py — Lark grammars with Transformer for AST evaluation
from __future__ import annotations

from typing import Any

from lark import Discard, Lark, Token, Transformer, Tree, v_args


# ─────────────────────────────────────────────────────────────────────────────
# 1. Arithmetic expression evaluator
# ─────────────────────────────────────────────────────────────────────────────

ARITH_GRAMMAR = r"""
    ?start: expr

    ?expr:  expr "+" term   -> add
          | expr "-" term   -> sub
          | term

    ?term:  term "*" factor -> mul
          | term "/" factor -> div
          | term "%" factor -> mod
          | factor

    ?factor: "-" factor     -> neg
           | atom "**" factor -> pow
           | atom

    ?atom:  NUMBER          -> number
          | NAME            -> var
          | "(" expr ")"

    NUMBER: /\d+(\.\d*)?([eE][+-]?\d+)?/
    NAME:   /[a-zA-Z_]\w*/

    %import common.WS
    %ignore WS
"""


@v_args(inline=True)
class ArithEval(Transformer):
    """
    Transformer evaluates the parse tree bottom-up.
    Each method receives children as positional args (inline=True).
    Returns a Python value at each node — leaf to root.
    """

    def __init__(self, variables: dict[str, float] | None = None):
        super().__init__()
        self.variables = variables or {}

    def number(self, n):  return float(n)
    def var(self, name):  return self.variables.get(str(name), 0.0)

    def add(self, a, b):  return a + b
    def sub(self, a, b):  return a - b
    def mul(self, a, b):  return a * b
    def div(self, a, b):  return a / b
    def mod(self, a, b):  return a % b
    def neg(self, a):     return -a
    def pow(self, a, b):  return a ** b


_arith_parser = Lark(ARITH_GRAMMAR, parser="lalr")


def eval_expr(text: str, variables: dict[str, float] | None = None) -> float:
    """
    Parse and evaluate an arithmetic expression.
    LALR parser is deterministic and fast for an unambiguous grammar like this.
    """
    tree = _arith_parser.parse(text)
    return ArithEval(variables).transform(tree)


# ─────────────────────────────────────────────────────────────────────────────
# 2. INI config parser (LALR)
# ─────────────────────────────────────────────────────────────────────────────

CONFIG_GRAMMAR = r"""
    start: section+

    section: "[" NAME "]" _NEWLINE entry*

    entry:  NAME "=" value _NEWLINE

    ?value: ESCAPED_STRING  -> string_val
          | SIGNED_NUMBER   -> number_val
          | BOOL            -> bool_val
          | BARE_VAL        -> bare_val

    BOOL:     "true" | "false" | "True" | "False" | "yes" | "no"
    BARE_VAL: /[^\n#][^\n]*/
    NAME:     /[a-zA-Z_][a-zA-Z0-9_\-.]+/
    _NEWLINE: /\r?\n/

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS_INLINE
    %ignore WS_INLINE
    %ignore /#[^\n]*/
"""


class ConfigTransformer(Transformer):
    def string_val(self, s):  return str(s[0])[1:-1]       # strip quotes
    def number_val(self, n):
        v = float(n[0])
        return int(v) if v == int(v) else v
    def bool_val(self, b):    return str(b[0]).lower() in ("true", "yes")
    def bare_val(self, v):    return str(v[0]).strip()
    def NAME(self, n):        return str(n)

    def entry(self, items):   return (items[0], items[1])

    def section(self, items):
        name = items[0]
        entries = dict(items[1:])
        return (name, entries)

    def start(self, sections):
        return dict(sections)


_config_parser = Lark(CONFIG_GRAMMAR, parser="lalr")


def parse_config(text: str) -> dict[str, dict[str, Any]]:
    """Parse an INI-style config string into a nested dict."""
    tree = _config_parser.parse(text.strip() + "\n")
    return ConfigTransformer().transform(tree)


# ─────────────────────────────────────────────────────────────────────────────
# 3. Boolean filter expression (Earley — supports ambiguity)
# ─────────────────────────────────────────────────────────────────────────────

FILTER_GRAMMAR = r"""
    ?start: expr

    ?expr:  expr "OR"  expr  -> or_expr
          | expr "AND" expr  -> and_expr
          | "NOT" expr       -> not_expr
          | "(" expr ")"
          | comparison

    comparison: FIELD OP VALUE

    FIELD: /[a-zA-Z_]\w*(\.[a-zA-Z_]\w*)*/
    OP:    "==" | "!=" | ">=" | "<=" | ">" | "<" | "~="
    VALUE: ESCAPED_STRING | NUMBER
    NUMBER: /\d+(\.\d+)?/

    %import common.ESCAPED_STRING
    %import common.WS
    %ignore WS
"""


@v_args(inline=True)
class FilterEval(Transformer):
    """Evaluate a filter expression against a record dict."""

    def __init__(self, record: dict):
        super().__init__()
        self.record = record

    def _get_field(self, field_path: str) -> Any:
        parts = field_path.split(".")
        val = self.record
        for part in parts:
            if isinstance(val, dict):
                val = val.get(part)
            else:
                val = getattr(val, part, None)
        return val

    def comparison(self, field, op, value):
        import ast
        val = self._get_field(str(field))
        try:
            rhs = ast.literal_eval(str(value))
        except Exception:
            rhs = str(value)
        ops = {"==": lambda a,b: a==b, "!=": lambda a,b: a!=b,
               ">": lambda a,b: a>b,  "<": lambda a,b: a<b,
               ">=": lambda a,b: a>=b, "<=": lambda a,b: a<=b,
               "~=": lambda a,b: str(b).lower() in str(a).lower()}
        fn = ops.get(str(op), lambda a, b: False)
        return fn(val, rhs)

    def and_expr(self, a, b):   return a and b
    def or_expr(self, a, b):    return a or b
    def not_expr(self, a):      return not a


_filter_parser = Lark(FILTER_GRAMMAR, parser="earley", ambiguity="resolve")


def eval_filter(expr: str, record: dict) -> bool:
    """
    Evaluate a boolean filter expression against a record dict.
    Example: 'status == "active" AND age >= 18'
    """
    tree = _filter_parser.parse(expr)
    return bool(FilterEval(record).transform(tree))


def filter_records(records: list[dict], expr: str) -> list[dict]:
    """Return records matching the filter expression."""
    return [r for r in records if eval_filter(expr, r)]


# ─────────────────────────────────────────────────────────────────────────────
# 4. Simple JSON-like data parser
# ─────────────────────────────────────────────────────────────────────────────

JSON_GRAMMAR = r"""
    ?value: object | array | string | number | "true" -> true
                                              | "false" -> false
                                              | "null"  -> null

    object: "{" [pair ("," pair)*] "}"
    pair:   ESCAPED_STRING ":" value

    array:  "[" [value ("," value)*] "]"

    string: ESCAPED_STRING
    number: SIGNED_NUMBER

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS
    %ignore WS
"""


@v_args(inline=True)
class JsonTransformer(Transformer):
    def string(self, s):  return str(s)[1:-1]
    def number(self, n):
        v = float(n)
        return int(v) if v == int(v) else v
    def true(self):       return True
    def false(self):      return False
    def null(self):       return None
    def array(self, *items): return list(items)
    def pair(self, k, v):    return (str(k)[1:-1], v)
    def object(self, *pairs):return dict(pairs)


_json_parser = Lark(JSON_GRAMMAR, parser="lalr")


def parse_json_like(text: str) -> Any:
    """Demonstrate a Lark-based JSON-like value parser."""
    tree = _json_parser.parse(text)
    return JsonTransformer().transform(tree)


# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────

CONFIG_SAMPLE = """\
[database]
host = localhost
port = 5432
ssl  = true

[server]
debug   = false
workers = 4
name    = "My API"
"""

if __name__ == "__main__":
    print("=== Arithmetic evaluator ===")
    exprs = [
        ("2 + 3 * 4",     {}),
        ("(2 + 3) * 4",   {}),
        ("x ** 2 + 1",    {"x": 5.0}),
        ("10 / 3",        {}),
        ("-b + 2",        {"b": 7.0}),
    ]
    for e, v in exprs:
        result = eval_expr(e, v)
        print(f"  {e:25} = {result}")

    print("\n=== Config parser ===")
    cfg = parse_config(CONFIG_SAMPLE)
    for sec, vals in cfg.items():
        print(f"  [{sec}]  {vals}")

    print("\n=== Boolean filter ===")
    records = [
        {"name": "Alice", "status": "active",   "age": 30, "score": 95},
        {"name": "Bob",   "status": "inactive", "age": 17, "score": 72},
        {"name": "Carol", "status": "active",   "age": 25, "score": 88},
    ]
    filters = [
        'status == "active" AND age >= 18',
        'score >= 90',
        'status == "active" AND score >= 85',
    ]
    for f in filters:
        matched = filter_records(records, f)
        names = [r["name"] for r in matched]
        print(f"  {f!r:45} → {names}")

    print("\n=== JSON-like parser ===")
    samples = ['{"a": 1, "b": [1, 2, 3], "c": true}', '["hello", 42, null]']
    for s in samples:
        print(f"  {s} → {parse_json_like(s)}")

For the pyparsing alternative — pyparsing builds grammars in pure Python using operator overloading (a + b for sequence, a | b for alternation), keeping the grammar definition next to parse actions in the same expression; Lark separates the grammar (a string of EBNF rules) from the Transformer class, which is the right structure for complex grammars because the EBNF notation is more readable than a tree of Python objects for non-trivial rules, and Lark’s LALR algorithm is significantly faster on larger inputs than pyparsing’s PEG-based packrat recursion. For the PLY / ANTLR alternative — PLY uses separate lexer and parser definitions with docstring-embedded rules (Python 2 style) and requires understanding YACC shift/reduce conflicts; ANTLR generates Java/Python code from external .g4 grammar files and has a heavy setup; Lark has a clean grammar string format (rule: a b | c d), works entirely in Python, and supports both Earley (for rapid prototyping of ambiguous grammars) and LALR (for production performance) with the same grammar format, switchable via the parser= argument. The Claude Skills 360 bundle includes Lark skill sets covering EBNF grammar string with lowercase rules and UPPERCASE terminals, Lark class with parser=“lalr” and “earley” options, Transformer subclass with method-per-rule pattern, @v_args(inline=True) for positional arguments, Discard for filtering children, Token type inspection, ?transparent rule optimization, %import common macros, eval_expr arithmetic evaluator, parse_config INI reader, eval_filter boolean filter evaluator, filter_records higher-order filter, and parse_json_like value parser. Start with the free tier to try EBNF grammar and parser code generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39