Blog / AI / Claude Code for re: Regular Expressions in Python

Claude Code for re: Regular Expressions in Python

Published: July 6, 2028

•

Read time: 5 min read

•

By: Claude Skills 360

Python’s re module provides Perl-compatible regular expressions. import re. compile: pat = re.compile(r"\d+") — reuse for performance. match: m = re.match(r"\d+", "42abc") — anchored at start. search: m = re.search(r"\d+", "abc42") — first match anywhere. fullmatch: re.fullmatch(r"\d+", "42") — entire string must match. findall: re.findall(r"\d+", "a1b2c3") → ["1","2","3"]. finditer: for m in re.finditer(r"\d+", text): m.group(); m.span(). sub: re.sub(r"\s+", " ", text). subn: new_text, count = re.subn(r"\s+", " ", text). split: re.split(r"\s+", text). groups: m = re.search(r"(\w+)@(\w+)", email); m.group(1); m.group(2); m.groups(). named groups: m = re.search(r"(?P<user>\w+)@(?P<domain>\w+)", email); m.group("user"). groupdict: m.groupdict(). Flags: re.IGNORECASE, re.MULTILINE (^/$ per line), re.DOTALL (. matches \n), re.VERBOSE (whitespace+comments allowed), re.ASCII. Combine: re.compile(r"...", re.I | re.M). Lookahead: (?=...) positive, (?!...) negative. Lookbehind: (?<=...) positive, (?<!...) negative. Non-greedy: .*?. re.escape("a.b+c") — escape literal. backreference: \1 or \g<name>. Substitution fn: re.sub(pat, lambda m: m.group().upper(), text). Claude Code generates email/URL/log parsers, data extractors, and text normalizers.

CLAUDE.md for re

## re Stack
- Stdlib: import re
- Compile: pat = re.compile(r"...", re.I | re.M)  — reuse across calls
- Extract: pat.findall(text) | [(m.group("k"),m.start()) for m in pat.finditer(text)]
- Named: r"(?P<name>...)" | m.groupdict() | re.sub(r"(?P<n>...)", lambda m: fn(m["n"]), text)
- Flags: re.I (ignore case) | re.M (multiline ^ $) | re.S (dotall) | re.X (verbose)
- Safety: always use r"..." raw strings | re.escape() for user-supplied literal strings

re Text Processing Pipeline

# app/patterns.py — compile, search, findall, sub, named groups, extractors, validators
from __future__ import annotations

import re
from dataclasses import dataclass
from typing import Any, Iterator


# ─────────────────────────────────────────────────────────────────────────────
# 1. Pre-compiled common patterns
# ─────────────────────────────────────────────────────────────────────────────

EMAIL   = re.compile(
    r"[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}",
    re.ASCII,
)
URL     = re.compile(
    r"https?://[^\s\"'<>]+",
    re.IGNORECASE,
)
IPV4    = re.compile(
    r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}"
    r"(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b",
)
ISO_DATE = re.compile(
    r"\b(\d{4})[-/](\d{1,2})[-/](\d{1,2})\b",
)
UUID    = re.compile(
    r"\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b",
    re.IGNORECASE,
)
SLUG    = re.compile(r"^[a-z0-9]+(?:-[a-z0-9]+)*$")

LOG_LINE = re.compile(
    r"""
    (?P<ts>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})   # timestamp
    \s+
    (?P<level>DEBUG|INFO|WARNING|ERROR|CRITICAL)     # log level
    \s+
    (?P<logger>\S+)                                  # logger name
    :\s+
    (?P<msg>.+)                                      # message
    """,
    re.VERBOSE,
)


# ─────────────────────────────────────────────────────────────────────────────
# 2. Extraction helpers
# ─────────────────────────────────────────────────────────────────────────────

def find_emails(text: str) -> list[str]:
    """
    Extract all email addresses from text.

    Example:
        find_emails("contact [email protected] or [email protected]")
        # ["[email protected]", "[email protected]"]
    """
    return EMAIL.findall(text)


def find_urls(text: str) -> list[str]:
    """
    Extract all HTTP/HTTPS URLs from text.

    Example:
        find_urls("Visit https://example.com or http://docs.site/guide")
    """
    return URL.findall(text)


def find_ipv4(text: str) -> list[str]:
    """Extract all IPv4 addresses from text."""
    return IPV4.findall(text)


def extract_named(pattern: re.Pattern, text: str) -> Iterator[dict[str, str]]:
    """
    Yield groupdict for each match of a pattern with named groups.

    Example:
        for d in extract_named(LOG_LINE, log_content):
            print(d["ts"], d["level"], d["msg"])
    """
    for m in pattern.finditer(text):
        yield m.groupdict()


# ─────────────────────────────────────────────────────────────────────────────
# 3. Validation helpers
# ─────────────────────────────────────────────────────────────────────────────

def is_valid_email(value: str) -> bool:
    """
    Quick email format validation.

    Example:
        is_valid_email("[email protected]")  # True
        is_valid_email("not-an-email")        # False
    """
    return bool(EMAIL.fullmatch(value))


def is_valid_slug(value: str) -> bool:
    """
    URL slug: lowercase letters, digits, hyphens.

    Example:
        is_valid_slug("my-post-2024")   # True
        is_valid_slug("My Post")        # False
    """
    return bool(SLUG.match(value))


def is_valid_ipv4(value: str) -> bool:
    """Full IPv4 address validation."""
    return bool(IPV4.fullmatch(value))


def is_valid_uuid(value: str) -> bool:
    """UUID v1-v5 format check."""
    return bool(UUID.fullmatch(value))


# ─────────────────────────────────────────────────────────────────────────────
# 4. Transformation helpers
# ─────────────────────────────────────────────────────────────────────────────

def normalize_whitespace(text: str) -> str:
    """
    Collapse multiple whitespace characters into a single space.

    Example:
        normalize_whitespace("  hello   world  ")  # "hello world"
    """
    return re.sub(r"\s+", " ", text).strip()


def slugify(text: str) -> str:
    """
    Convert text to a URL-safe slug.

    Example:
        slugify("Hello, World! 2024")  # "hello-world-2024"
    """
    text = text.lower()
    text = re.sub(r"[^\w\s-]", "", text)   # strip punctuation
    text = re.sub(r"[\s_-]+", "-", text)   # spaces → hyphens
    return text.strip("-")


def mask_emails(text: str, replacement: str = "[EMAIL]") -> str:
    """
    Replace all email addresses with a placeholder.

    Example:
        mask_emails("Contact [email protected] for details")
        # "Contact [EMAIL] for details"
    """
    return EMAIL.sub(replacement, text)


def mask_ips(text: str, replacement: str = "[IP]") -> str:
    """Replace all IPv4 addresses."""
    return IPV4.sub(replacement, text)


def camel_to_snake(name: str) -> str:
    """
    Convert CamelCase to snake_case.

    Example:
        camel_to_snake("HttpResponseCode")  # "http_response_code"
        camel_to_snake("XMLParser")         # "xml_parser"
    """
    s1 = re.sub(r"([A-Z]+)([A-Z][a-z])", r"\1_\2", name)
    return re.sub(r"([a-z0-9])([A-Z])", r"\1_\2", s1).lower()


def snake_to_camel(name: str) -> str:
    """
    Convert snake_case to CamelCase.

    Example:
        snake_to_camel("http_response_code")  # "HttpResponseCode"
    """
    return re.sub(r"(_\w)", lambda m: m.group(1)[1].upper(), name.capitalize())


# ─────────────────────────────────────────────────────────────────────────────
# 5. Log parsing
# ─────────────────────────────────────────────────────────────────────────────

@dataclass
class LogEntry:
    ts:     str
    level:  str
    logger: str
    msg:    str


def parse_log_lines(log_text: str) -> list[LogEntry]:
    """
    Parse structured log lines into LogEntry objects.

    Example:
        entries = parse_log_lines(log_file.read())
        errors  = [e for e in entries if e.level == "ERROR"]
    """
    entries = []
    for m in LOG_LINE.finditer(log_text):
        d = m.groupdict()
        entries.append(LogEntry(
            ts=d["ts"], level=d["level"],
            logger=d["logger"], msg=d["msg"]
        ))
    return entries


# ─────────────────────────────────────────────────────────────────────────────
# 6. Template substitution
# ─────────────────────────────────────────────────────────────────────────────

def render_template(template: str, variables: dict[str, Any]) -> str:
    """
    Replace {{variable}} placeholders with values.

    Example:
        render_template("Hello {{name}}, your code is {{code}}",
                        {"name": "Alice", "code": "A1B2"})
        # "Hello Alice, your code is A1B2"
    """
    def replacer(m: re.Match) -> str:
        key = m.group(1).strip()
        val = variables.get(key)
        if val is None:
            return m.group(0)  # leave unchanged if not found
        return str(val)

    return re.sub(r"\{\{(.+?)\}\}", replacer, template)


def extract_placeholders(template: str) -> list[str]:
    """
    Find all {{variables}} in a template string.

    Example:
        extract_placeholders("Hello {{name}}, your order {{id}} is ready")
        # ["name", "id"]
    """
    return [m.strip() for m in re.findall(r"\{\{(.+?)\}\}", template)]


# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    print("=== re demo ===")

    sample = (
        "Contact [email protected] or [email protected]. "
        "Server at 192.168.1.100 or https://api.example.com/v2/data. "
        "Request ID: 550e8400-e29b-41d4-a716-446655440000"
    )

    print(f"\n--- find_emails ---")
    print(f"  {find_emails(sample)}")

    print(f"\n--- find_urls ---")
    print(f"  {find_urls(sample)}")

    print(f"\n--- find_ipv4 ---")
    print(f"  {find_ipv4(sample)}")

    print(f"\n--- UUID ---")
    print(f"  UUIDs: {UUID.findall(sample)}")

    print(f"\n--- is_valid_email ---")
    for addr in ["[email protected]", "bad@", "no-at-sign", "[email protected]"]:
        print(f"  {addr!r:30s} → {is_valid_email(addr)}")

    print(f"\n--- normalize_whitespace ---")
    messy = "  multiple   spaces\t\there  "
    print(f"  {normalize_whitespace(messy)!r}")

    print(f"\n--- slugify ---")
    for title in ["Hello World!", "Python 3.12 Release", "Café & Restaurant"]:
        print(f"  {title!r:30s} → {slugify(title)!r}")

    print(f"\n--- camel_to_snake / snake_to_camel ---")
    for name in ["HttpResponseCode", "XMLParser", "getUserByID"]:
        snake = camel_to_snake(name)
        camel = snake_to_camel(snake)
        print(f"  {name!r:25s} → {snake!r:30s} → {camel!r}")

    print(f"\n--- log parsing ---")
    logs = """\
2024-01-15T10:30:00 INFO  app.server: Server started on port 8080
2024-01-15T10:30:01 DEBUG app.db: Connected to database
2024-01-15T10:30:05 ERROR app.auth: Invalid token received
"""
    entries = parse_log_lines(logs)
    for e in entries:
        print(f"  [{e.level:8s}] {e.logger}: {e.msg}")

    print(f"\n--- render_template ---")
    tpl = "Dear {{name}}, your order {{order_id}} ships on {{date}}."
    result = render_template(tpl, {"name": "Alice", "order_id": "ORD-42", "date": "2024-02-01"})
    print(f"  {result}")
    print(f"  placeholders: {extract_placeholders(tpl)}")

    print("\n=== done ===")

For the regex alternative — the third-party regex module (PyPI) is a drop-in superset of stdlib re with Unicode property escapes (\p{Letter}), fuzzy matching ((?:word){e<=1}), variable-length lookbehinds, possessive quantifiers, atomic groups, and overlapping matches; stdlib re covers the Perl-compatible subset that handles 95% of real-world pattern matching without external dependencies — use regex when you need Unicode property matching (scripts, categories), fuzzy approximate matching, or variable-length lookbehinds, stdlib re for everything else. For the parse alternative — parse (PyPI) provides the inverse of str.format(): parse.parse("Hello, {}!", "Hello, World!") → Result — much simpler than writing a regex for structured text extraction; stdlib re is more powerful but requires learning pattern syntax — use parse for quick structured extraction from format-like strings, re when you need full control over capturing groups, lookaheads, substitutions, and compiled performance. The Claude Skills 360 bundle includes re skill sets covering EMAIL/URL/IPV4/ISO_DATE/UUID/SLUG/LOG_LINE pre-compiled patterns, find_emails()/find_urls()/find_ipv4()/extract_named() extractors, is_valid_email()/is_valid_slug()/is_valid_uuid() validators, normalize_whitespace()/slugify()/mask_emails()/camel_to_snake()/snake_to_camel() transformers, parse_log_lines() structured extraction, and render_template()/extract_placeholders() template utilities. Start with the free tier to try text pattern matching and re extraction pipeline code generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39