Lark parses text using EBNF grammars and generates parse trees. pip install lark. Grammar: from lark import Lark, Transformer, Tree, Token. Grammar string: rule names lowercase, terminals uppercase. parser = Lark(grammar, parser="earley") or parser="lalr". LALR is faster, Earley handles ambiguity. Parse: tree = parser.parse(text). tree.pretty() — printable tree. tree.data — rule name. tree.children — list of subtrees and tokens. Transformer: subclass Transformer, method name = rule name, receives list of children. @v_args(inline=True) — pass children as positional args. Discard — return to drop a child. Token — terminal value, has .type and .value. Common patterns: ?rule in grammar — transparent (inline children). !keyword — keep token. terminal: /regex/. rule: a b c — sequence. rule: a | b — alternation. rule: a* — zero or more. rule: a+ — one or more. rule: a? — optional. %import common.WS and %ignore WS — skip whitespace. %import common.NUMBER. %import common.CNAME — C-style identifier. %import common.ESCAPED_STRING. Earley: ambiguous grammars, slower, ambiguity="resolve". LALR: fast, deterministic, use start="rule_name". Dynamic lexer: lexer="dynamic". Claude Code generates Lark grammars, Transformer subclasses, and LALR expression parsers.
CLAUDE.md for Lark
## Lark Stack
- Version: lark >= 1.2 | pip install lark
- Grammar: EBNF string — lowercase rules, UPPERCASE terminals, ? for transparent rules
- Parser: Lark(grammar, parser="lalr") for speed | "earley" for ambiguous grammars
- Parse: tree = parser.parse(text) | tree.pretty() for debug | tree.data for rule name
- Transform: Transformer subclass — method name = rule name, returns value
- Inline: @v_args(inline=True) on Transformer methods — children as positional args
- Import: %import common.WS / NUMBER / CNAME / ESCAPED_STRING | %ignore WS
Lark EBNF Parser Pipeline
# app/lark_parsers.py — Lark grammars with Transformer for AST evaluation
from __future__ import annotations
from typing import Any
from lark import Discard, Lark, Token, Transformer, Tree, v_args
# ─────────────────────────────────────────────────────────────────────────────
# 1. Arithmetic expression evaluator
# ─────────────────────────────────────────────────────────────────────────────
ARITH_GRAMMAR = r"""
?start: expr
?expr: expr "+" term -> add
| expr "-" term -> sub
| term
?term: term "*" factor -> mul
| term "/" factor -> div
| term "%" factor -> mod
| factor
?factor: "-" factor -> neg
| atom "**" factor -> pow
| atom
?atom: NUMBER -> number
| NAME -> var
| "(" expr ")"
NUMBER: /\d+(\.\d*)?([eE][+-]?\d+)?/
NAME: /[a-zA-Z_]\w*/
%import common.WS
%ignore WS
"""
@v_args(inline=True)
class ArithEval(Transformer):
"""
Transformer evaluates the parse tree bottom-up.
Each method receives children as positional args (inline=True).
Returns a Python value at each node — leaf to root.
"""
def __init__(self, variables: dict[str, float] | None = None):
super().__init__()
self.variables = variables or {}
def number(self, n): return float(n)
def var(self, name): return self.variables.get(str(name), 0.0)
def add(self, a, b): return a + b
def sub(self, a, b): return a - b
def mul(self, a, b): return a * b
def div(self, a, b): return a / b
def mod(self, a, b): return a % b
def neg(self, a): return -a
def pow(self, a, b): return a ** b
_arith_parser = Lark(ARITH_GRAMMAR, parser="lalr")
def eval_expr(text: str, variables: dict[str, float] | None = None) -> float:
"""
Parse and evaluate an arithmetic expression.
LALR parser is deterministic and fast for an unambiguous grammar like this.
"""
tree = _arith_parser.parse(text)
return ArithEval(variables).transform(tree)
# ─────────────────────────────────────────────────────────────────────────────
# 2. INI config parser (LALR)
# ─────────────────────────────────────────────────────────────────────────────
CONFIG_GRAMMAR = r"""
start: section+
section: "[" NAME "]" _NEWLINE entry*
entry: NAME "=" value _NEWLINE
?value: ESCAPED_STRING -> string_val
| SIGNED_NUMBER -> number_val
| BOOL -> bool_val
| BARE_VAL -> bare_val
BOOL: "true" | "false" | "True" | "False" | "yes" | "no"
BARE_VAL: /[^\n#][^\n]*/
NAME: /[a-zA-Z_][a-zA-Z0-9_\-.]+/
_NEWLINE: /\r?\n/
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%import common.WS_INLINE
%ignore WS_INLINE
%ignore /#[^\n]*/
"""
class ConfigTransformer(Transformer):
def string_val(self, s): return str(s[0])[1:-1] # strip quotes
def number_val(self, n):
v = float(n[0])
return int(v) if v == int(v) else v
def bool_val(self, b): return str(b[0]).lower() in ("true", "yes")
def bare_val(self, v): return str(v[0]).strip()
def NAME(self, n): return str(n)
def entry(self, items): return (items[0], items[1])
def section(self, items):
name = items[0]
entries = dict(items[1:])
return (name, entries)
def start(self, sections):
return dict(sections)
_config_parser = Lark(CONFIG_GRAMMAR, parser="lalr")
def parse_config(text: str) -> dict[str, dict[str, Any]]:
"""Parse an INI-style config string into a nested dict."""
tree = _config_parser.parse(text.strip() + "\n")
return ConfigTransformer().transform(tree)
# ─────────────────────────────────────────────────────────────────────────────
# 3. Boolean filter expression (Earley — supports ambiguity)
# ─────────────────────────────────────────────────────────────────────────────
FILTER_GRAMMAR = r"""
?start: expr
?expr: expr "OR" expr -> or_expr
| expr "AND" expr -> and_expr
| "NOT" expr -> not_expr
| "(" expr ")"
| comparison
comparison: FIELD OP VALUE
FIELD: /[a-zA-Z_]\w*(\.[a-zA-Z_]\w*)*/
OP: "==" | "!=" | ">=" | "<=" | ">" | "<" | "~="
VALUE: ESCAPED_STRING | NUMBER
NUMBER: /\d+(\.\d+)?/
%import common.ESCAPED_STRING
%import common.WS
%ignore WS
"""
@v_args(inline=True)
class FilterEval(Transformer):
"""Evaluate a filter expression against a record dict."""
def __init__(self, record: dict):
super().__init__()
self.record = record
def _get_field(self, field_path: str) -> Any:
parts = field_path.split(".")
val = self.record
for part in parts:
if isinstance(val, dict):
val = val.get(part)
else:
val = getattr(val, part, None)
return val
def comparison(self, field, op, value):
import ast
val = self._get_field(str(field))
try:
rhs = ast.literal_eval(str(value))
except Exception:
rhs = str(value)
ops = {"==": lambda a,b: a==b, "!=": lambda a,b: a!=b,
">": lambda a,b: a>b, "<": lambda a,b: a<b,
">=": lambda a,b: a>=b, "<=": lambda a,b: a<=b,
"~=": lambda a,b: str(b).lower() in str(a).lower()}
fn = ops.get(str(op), lambda a, b: False)
return fn(val, rhs)
def and_expr(self, a, b): return a and b
def or_expr(self, a, b): return a or b
def not_expr(self, a): return not a
_filter_parser = Lark(FILTER_GRAMMAR, parser="earley", ambiguity="resolve")
def eval_filter(expr: str, record: dict) -> bool:
"""
Evaluate a boolean filter expression against a record dict.
Example: 'status == "active" AND age >= 18'
"""
tree = _filter_parser.parse(expr)
return bool(FilterEval(record).transform(tree))
def filter_records(records: list[dict], expr: str) -> list[dict]:
"""Return records matching the filter expression."""
return [r for r in records if eval_filter(expr, r)]
# ─────────────────────────────────────────────────────────────────────────────
# 4. Simple JSON-like data parser
# ─────────────────────────────────────────────────────────────────────────────
JSON_GRAMMAR = r"""
?value: object | array | string | number | "true" -> true
| "false" -> false
| "null" -> null
object: "{" [pair ("," pair)*] "}"
pair: ESCAPED_STRING ":" value
array: "[" [value ("," value)*] "]"
string: ESCAPED_STRING
number: SIGNED_NUMBER
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%import common.WS
%ignore WS
"""
@v_args(inline=True)
class JsonTransformer(Transformer):
def string(self, s): return str(s)[1:-1]
def number(self, n):
v = float(n)
return int(v) if v == int(v) else v
def true(self): return True
def false(self): return False
def null(self): return None
def array(self, *items): return list(items)
def pair(self, k, v): return (str(k)[1:-1], v)
def object(self, *pairs):return dict(pairs)
_json_parser = Lark(JSON_GRAMMAR, parser="lalr")
def parse_json_like(text: str) -> Any:
"""Demonstrate a Lark-based JSON-like value parser."""
tree = _json_parser.parse(text)
return JsonTransformer().transform(tree)
# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────
CONFIG_SAMPLE = """\
[database]
host = localhost
port = 5432
ssl = true
[server]
debug = false
workers = 4
name = "My API"
"""
if __name__ == "__main__":
print("=== Arithmetic evaluator ===")
exprs = [
("2 + 3 * 4", {}),
("(2 + 3) * 4", {}),
("x ** 2 + 1", {"x": 5.0}),
("10 / 3", {}),
("-b + 2", {"b": 7.0}),
]
for e, v in exprs:
result = eval_expr(e, v)
print(f" {e:25} = {result}")
print("\n=== Config parser ===")
cfg = parse_config(CONFIG_SAMPLE)
for sec, vals in cfg.items():
print(f" [{sec}] {vals}")
print("\n=== Boolean filter ===")
records = [
{"name": "Alice", "status": "active", "age": 30, "score": 95},
{"name": "Bob", "status": "inactive", "age": 17, "score": 72},
{"name": "Carol", "status": "active", "age": 25, "score": 88},
]
filters = [
'status == "active" AND age >= 18',
'score >= 90',
'status == "active" AND score >= 85',
]
for f in filters:
matched = filter_records(records, f)
names = [r["name"] for r in matched]
print(f" {f!r:45} → {names}")
print("\n=== JSON-like parser ===")
samples = ['{"a": 1, "b": [1, 2, 3], "c": true}', '["hello", 42, null]']
for s in samples:
print(f" {s} → {parse_json_like(s)}")
For the pyparsing alternative — pyparsing builds grammars in pure Python using operator overloading (a + b for sequence, a | b for alternation), keeping the grammar definition next to parse actions in the same expression; Lark separates the grammar (a string of EBNF rules) from the Transformer class, which is the right structure for complex grammars because the EBNF notation is more readable than a tree of Python objects for non-trivial rules, and Lark’s LALR algorithm is significantly faster on larger inputs than pyparsing’s PEG-based packrat recursion. For the PLY / ANTLR alternative — PLY uses separate lexer and parser definitions with docstring-embedded rules (Python 2 style) and requires understanding YACC shift/reduce conflicts; ANTLR generates Java/Python code from external .g4 grammar files and has a heavy setup; Lark has a clean grammar string format (rule: a b | c d), works entirely in Python, and supports both Earley (for rapid prototyping of ambiguous grammars) and LALR (for production performance) with the same grammar format, switchable via the parser= argument. The Claude Skills 360 bundle includes Lark skill sets covering EBNF grammar string with lowercase rules and UPPERCASE terminals, Lark class with parser=“lalr” and “earley” options, Transformer subclass with method-per-rule pattern, @v_args(inline=True) for positional arguments, Discard for filtering children, Token type inspection, ?transparent rule optimization, %import common macros, eval_expr arithmetic evaluator, parse_config INI reader, eval_filter boolean filter evaluator, filter_records higher-order filter, and parse_json_like value parser. Start with the free tier to try EBNF grammar and parser code generation.