Blog / AI / Claude Code for gzip: Gzip Compression in Python

Claude Code for gzip: Gzip Compression in Python

Published: August 9, 2028

•

Read time: 5 min read

•

By: Claude Skills 360

Python’s gzip module reads and writes gzip-format compressed files. import gzip. open: gzip.open(filename, mode="rb", compresslevel=9, encoding=None, errors=None, newline=None) — accepts paths or file-like objects; modes: "rb", "wb", "ab", "xb", "rt", "wt", "at", "xt". compress: gzip.compress(data, compresslevel=9, mtime=None) → compressed bytes; mtime=0 for deterministic output. decompress: gzip.decompress(data) → original bytes. GzipFile: gzip.GzipFile(filename, mode, compresslevel, fileobj, mtime) — low-level access; .write(b), .read(n), .peek(n), .close(). BadGzipFile: raised on invalid magic or corrupt gzip header (gzip.BadGzipFile). compresslevel: 0–9; gzip.BEST_COMPRESSION = 9, gzip.BEST_SPEED = 1. mtime: timestamp embedded in header; set to 0 for reproducible builds. Concatenated streams: gzip supports multiple members in one file — decompress reads all. Reading line-by-line: with gzip.open("f.gz", "rt") as f: for line in f: — streams decompression. Writing incrementally: open in "wb" mode, call .write() in chunks. Claude Code generates log compressors, gzip file readers, streaming ETL pipelines, and deterministic artifact builders.

CLAUDE.md for gzip

## gzip Stack
- Stdlib: import gzip
- One-shot: gzip.compress(data, compresslevel=6, mtime=0)   # deterministic
- File write: with gzip.open("f.gz", "wb") as f: f.write(data)
- File read:  with gzip.open("f.gz", "rb") as f: data = f.read()
- Text:       with gzip.open("f.gz", "rt", encoding="utf-8") as f:
- Stream:     pass fileobj=io.BytesIO() for in-memory gzip

gzip File Compression Pipeline

# app/gzutil.py — one-shot, file IO, streaming, text, concat, deterministic
from __future__ import annotations

import gzip
import io
import json
import os
from pathlib import Path
from typing import Any, Iterator


# ─────────────────────────────────────────────────────────────────────────────
# 1. One-shot helpers
# ─────────────────────────────────────────────────────────────────────────────

def gz_compress(data: bytes, level: int = 6) -> bytes:
    """
    Compress bytes to gzip format.

    Example:
        payload = gz_compress(json_bytes, level=6)
        headers["Content-Encoding"] = "gzip"
    """
    return gzip.compress(data, compresslevel=level)


def gz_decompress(data: bytes) -> bytes:
    """
    Decompress gzip bytes.

    Example:
        body = gz_decompress(response_bytes)
    """
    return gzip.decompress(data)


def gz_compress_deterministic(data: bytes, level: int = 6) -> bytes:
    """
    Compress bytes to gzip with mtime=0 for bit-for-bit reproducible output.
    Same input always produces identical compressed bytes regardless of time.

    Example:
        artifact = gz_compress_deterministic(wheels_bytes)
        sha256(artifact)  # stable across builds
    """
    return gzip.compress(data, compresslevel=level, mtime=0)


def gz_text(text: str, encoding: str = "utf-8", level: int = 6) -> bytes:
    """
    Compress a text string to gzip bytes.

    Example:
        gz_bytes = gz_text(html_content)
    """
    return gzip.compress(text.encode(encoding), compresslevel=level, mtime=0)


def gz_decode_text(data: bytes, encoding: str = "utf-8") -> str:
    """
    Decompress gzip bytes and decode to str.

    Example:
        html = gz_decode_text(cached_gz)
    """
    return gzip.decompress(data).decode(encoding)


def gz_ratio(original: bytes, compressed: bytes) -> float:
    """Return compressed/original ratio (lower = better)."""
    return len(compressed) / len(original) if original else 1.0


# ─────────────────────────────────────────────────────────────────────────────
# 2. File I/O
# ─────────────────────────────────────────────────────────────────────────────

def write_gz(path: str | Path, data: bytes, level: int = 6) -> int:
    """
    Write bytes to a .gz file. Returns compressed size.

    Example:
        n = write_gz("output.json.gz", json_bytes)
        print(f"wrote {n:,} bytes compressed")
    """
    compressed = gzip.compress(data, compresslevel=level, mtime=0)
    Path(path).write_bytes(compressed)
    return len(compressed)


def read_gz(path: str | Path) -> bytes:
    """
    Read and decompress a .gz file, returning original bytes.

    Example:
        data = read_gz("records.json.gz")
    """
    return gzip.decompress(Path(path).read_bytes())


def write_gz_text(path: str | Path, text: str, encoding: str = "utf-8", level: int = 6) -> None:
    """
    Write a text string to a .gz file in text mode.

    Example:
        write_gz_text("report.txt.gz", report_content)
    """
    with gzip.open(path, "wt", compresslevel=level, encoding=encoding) as f:
        f.write(text)


def read_gz_text(path: str | Path, encoding: str = "utf-8") -> str:
    """
    Read a .gz text file and return the decompressed string.

    Example:
        report = read_gz_text("report.txt.gz")
    """
    with gzip.open(path, "rt", encoding=encoding) as f:
        return f.read()


def gz_lines(path: str | Path, encoding: str = "utf-8") -> Iterator[str]:
    """
    Iterate over lines of a gzip-compressed text file without loading all into memory.

    Example:
        for line in gz_lines("events.log.gz"):
            process(json.loads(line))
    """
    with gzip.open(path, "rt", encoding=encoding) as f:
        for line in f:
            yield line.rstrip("\n")


def compress_file(src: str | Path, dst: str | Path | None = None, level: int = 6) -> Path:
    """
    Compress src to dst (defaults to src + ".gz"). Returns destination path.

    Example:
        out = compress_file("data.json")
        print(out)   # "data.json.gz"
    """
    src = Path(src)
    dst = Path(dst) if dst else src.with_suffix(src.suffix + ".gz")
    data = src.read_bytes()
    write_gz(dst, data, level=level)
    return dst


def decompress_file(src: str | Path, dst: str | Path | None = None) -> Path:
    """
    Decompress src.gz to dst (defaults to src without .gz suffix).

    Example:
        out = decompress_file("data.json.gz")
        print(out)   # "data.json"
    """
    src = Path(src)
    if dst is None:
        stem = src.stem  # removes the last extension (.gz)
        dst = src.parent / stem
    dst = Path(dst)
    dst.write_bytes(read_gz(src))
    return dst


# ─────────────────────────────────────────────────────────────────────────────
# 3. Streaming and in-memory
# ─────────────────────────────────────────────────────────────────────────────

def compress_stream_to_bytes(chunks: Iterator[bytes], level: int = 6) -> bytes:
    """
    Compress a stream of byte chunks into a single gzip bytes object.

    Example:
        gz = compress_stream_to_bytes(read_chunks("big.log"))
    """
    buf = io.BytesIO()
    with gzip.GzipFile(fileobj=buf, mode="wb", compresslevel=level, mtime=0) as gz:
        for chunk in chunks:
            gz.write(chunk)
    return buf.getvalue()


def decompress_stream(data: bytes, chunk_size: int = 65536) -> Iterator[bytes]:
    """
    Decompress gzip bytes in chunks, yielding decompressed data incrementally.

    Example:
        for chunk in decompress_stream(response_body):
            write_to_disk(chunk)
    """
    buf = io.BytesIO(data)
    with gzip.GzipFile(fileobj=buf, mode="rb") as gz:
        while True:
            chunk = gz.read(chunk_size)
            if not chunk:
                break
            yield chunk


def gz_to_buffer(data: bytes, level: int = 6) -> io.BytesIO:
    """
    Compress data and return an in-memory BytesIO buffer (rewound).

    Example:
        buf = gz_to_buffer(html_bytes)
        upload(buf.read())
    """
    buf = io.BytesIO()
    buf.write(gzip.compress(data, compresslevel=level, mtime=0))
    buf.seek(0)
    return buf


# ─────────────────────────────────────────────────────────────────────────────
# 4. JSON and JSONL helpers
# ─────────────────────────────────────────────────────────────────────────────

def write_json_gz(path: str | Path, obj: Any, level: int = 6, indent: int | None = None) -> None:
    """
    Serialize obj to JSON and write to a .json.gz file.

    Example:
        write_json_gz("results.json.gz", results_list)
    """
    raw = json.dumps(obj, indent=indent, ensure_ascii=False).encode("utf-8")
    write_gz(path, raw, level=level)


def read_json_gz(path: str | Path) -> Any:
    """
    Read and parse a .json.gz file.

    Example:
        results = read_json_gz("results.json.gz")
    """
    return json.loads(read_gz(path).decode("utf-8"))


def write_jsonl_gz(path: str | Path, records: Iterator[Any], level: int = 6) -> int:
    """
    Write records as JSON Lines to a .jsonl.gz file. Returns record count.

    Example:
        n = write_jsonl_gz("events.jsonl.gz", (r.to_dict() for r in events))
    """
    count = 0
    with gzip.open(path, "wt", compresslevel=level, encoding="utf-8") as f:
        for record in records:
            f.write(json.dumps(record, ensure_ascii=False) + "\n")
            count += 1
    return count


def read_jsonl_gz(path: str | Path) -> Iterator[Any]:
    """
    Read a .jsonl.gz file, yielding parsed JSON objects line by line.

    Example:
        for event in read_jsonl_gz("events.jsonl.gz"):
            process(event)
    """
    with gzip.open(path, "rt", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                yield json.loads(line)


# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    import tempfile

    print("=== gzip demo ===")

    sample_text = ("Hello gzip world! " * 500).encode("utf-8")
    print(f"\nOriginal: {len(sample_text):,} bytes")

    print("\n--- one-shot ---")
    for level in [1, 6, 9]:
        c = gz_compress(sample_text, level=level)
        rt = gz_decompress(c)
        print(f"  level={level}: {len(c):,} bytes  ratio={gz_ratio(sample_text, c):.3f}  ok={rt==sample_text}")

    print("\n--- deterministic ---")
    a = gz_compress_deterministic(sample_text)
    b = gz_compress_deterministic(sample_text)
    print(f"  two calls identical: {a == b}")

    print("\n--- text helpers ---")
    msg = "Héllo wörld! " * 100
    gz_bytes = gz_text(msg)
    decoded  = gz_decode_text(gz_bytes)
    print(f"  text roundtrip ok: {decoded == msg}")

    with tempfile.TemporaryDirectory() as td:
        print("\n--- file IO ---")
        json_path = os.path.join(td, "data.json")
        gz_path   = os.path.join(td, "data.json.gz")
        Path(json_path).write_text("Hello world " * 200)

        n = write_gz(gz_path, Path(json_path).read_bytes())
        print(f"  wrote {n:,} bytes to {os.path.basename(gz_path)}")
        restored = read_gz(gz_path)
        print(f"  roundtrip ok: {restored == Path(json_path).read_bytes()}")

        text_gz = os.path.join(td, "report.txt.gz")
        write_gz_text(text_gz, "Line 1\nLine 2\nLine 3\n")
        lines = list(gz_lines(text_gz))
        print(f"  gz_lines: {lines}")

        print("\n--- compress/decompress file ---")
        out_gz = compress_file(json_path)
        print(f"  compressed to {out_gz.name}")
        out_json = decompress_file(out_gz, os.path.join(td, "data2.json"))
        print(f"  decompressed to {out_json.name}, size={out_json.stat().st_size:,}")

        print("\n--- JSON and JSONL ---")
        records = [{"id": i, "value": f"Record {i}", "data": "x" * 50} for i in range(100)]
        jsonl_gz = os.path.join(td, "records.jsonl.gz")
        n = write_jsonl_gz(jsonl_gz, iter(records))
        print(f"  wrote {n} JSONL records to {os.path.basename(jsonl_gz)}")
        size = os.path.getsize(jsonl_gz)
        print(f"  compressed size: {size:,} bytes")
        loaded = list(read_jsonl_gz(jsonl_gz))
        print(f"  read back {len(loaded)} records, match: {loaded == records}")

        json_gz = os.path.join(td, "data.json.gz")
        write_json_gz(json_gz, {"items": records[:5]})
        result = read_json_gz(json_gz)
        print(f"  json.gz roundtrip: {len(result['items'])} items ok")

    print("\n--- streaming ---")
    chunk_size = 1024
    chunks = [sample_text[i:i+chunk_size] for i in range(0, len(sample_text), chunk_size)]
    gz_bytes2 = compress_stream_to_bytes(iter(chunks))
    decompressed = b"".join(decompress_stream(gz_bytes2))
    print(f"  stream compress: {len(gz_bytes2):,} bytes  roundtrip: {decompressed == sample_text}")

    print("\n=== done ===")

For the zlib alternative — zlib.compress/zlib.decompress operate at the deflate protocol level and support raw deflate (no headers), zlib format, and gzip format via the wbits parameter; gzip wraps zlib to produce proper gzip files with the standard two-byte magic, OS field, and optional filename/mtime metadata — use gzip when producing files that standard tools (gunzip, tar -z, browsers via Content-Encoding: gzip) will read; use zlib directly for custom binary framing, HTTP Transfer-Encoding: deflate, or when you need raw deflate without any file-format overhead. For the lzma / py7zr alternative — lzma.compress(data, preset=6) achieves 20–40% better compression than gzip on typical text/JSON at 10–30× the CPU cost; xz files use LZMA2 and are the standard high-ratio Linux archive format; py7zr (PyPI) handles 7z archives with LZMA support — use LZMA/XZ for distributing large datasets and build artifacts where download bandwidth matters more than compression time; use gzip for network transfer, logging pipelines, and any workload where real-time or near-real-time (de)compression throughput matters. The Claude Skills 360 bundle includes gzip skill sets covering gz_compress()/gz_decompress()/gz_compress_deterministic()/gz_text()/gz_ratio() one-shot helpers, write_gz()/read_gz()/write_gz_text()/read_gz_text()/gz_lines()/compress_file()/decompress_file() file I/O, compress_stream_to_bytes()/decompress_stream()/gz_to_buffer() streaming helpers, and write_json_gz()/read_json_gz()/write_jsonl_gz()/read_jsonl_gz() JSON/JSONL integration. Start with the free tier to try gzip file compression patterns and gzip pipeline code generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39