Python’s pickle module serializes objects to bytes and restores them. import pickle. dumps: pickle.dumps(obj, protocol=None) → bytes. loads: pickle.loads(b) → object. dump: pickle.dump(obj, file) — write to binary file-like. load: pickle.load(file) — read from binary file-like. protocol: 0 (ASCII), 1–5; pickle.HIGHEST_PROTOCOL = 5 (Python 3.8+); pickle.DEFAULT_PROTOCOL = 5 (3.8+). with open("data.pkl", "wb") as f: pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL). with open("data.pkl", "rb") as f: obj = pickle.load(f). Customization: __reduce__ → (callable, args) tuple; __reduce_ex__(protocol) → same; __getstate__ / __setstate__ → control what dict is serialized. copyreg: copyreg.pickle(SomeClass, reduce_fn) — register for types you don’t control. Security: never unpickle untrusted data — arbitrary code execution risk; use hmac to sign pickle bytes before storing. In-memory: io.BytesIO buffer for serialize/deserialize in-process. Deepcopy shortcut: pickle.loads(pickle.dumps(obj)) is sometimes faster than copy.deepcopy for large plain-data graphs. multiprocessing: pickle is used internally by multiprocessing to pass objects between processes — classes and functions must be importable at top level. Claude Code generates ML model caches, session stores, IPC payload serializers, and checkpoint managers.
CLAUDE.md for pickle
## pickle Stack
- Stdlib: import pickle
- Bytes: data = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
- File: pickle.dump(obj, open("f.pkl","wb")) | pickle.load(open("f.pkl","rb"))
- Custom: __getstate__ / __setstate__ for selective field serialization
- Safe: sign bytes with hmac before storing; NEVER load untrusted pickle
- Fast DC: pickle.loads(pickle.dumps(obj)) — deepcopy for large plain-data
pickle Serialization Pipeline
# app/pickleutil.py — safe load, signed pickle, cache, checkpoint, custom state
from __future__ import annotations
import hashlib
import hmac
import io
import os
import pickle
import struct
import time
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, TypeVar
T = TypeVar("T")
# ─────────────────────────────────────────────────────────────────────────────
# 1. Core helpers
# ─────────────────────────────────────────────────────────────────────────────
def to_bytes(obj: Any, protocol: int = pickle.HIGHEST_PROTOCOL) -> bytes:
"""
Serialize obj to bytes using the given pickle protocol.
Example:
data = to_bytes({"users": [1, 2, 3]})
"""
return pickle.dumps(obj, protocol=protocol)
def from_bytes(data: bytes) -> Any:
"""
Deserialize obj from bytes.
WARNING: Only call with data from a trusted source.
Example:
obj = from_bytes(cache_bytes)
"""
return pickle.loads(data)
def to_file(obj: Any, path: str | Path, protocol: int = pickle.HIGHEST_PROTOCOL) -> None:
"""
Serialize obj to a file atomically (write to temp then rename).
Example:
to_file(model_state, "checkpoint.pkl")
"""
p = Path(path)
tmp = p.with_suffix(".pkl.tmp")
with open(tmp, "wb") as f:
pickle.dump(obj, f, protocol=protocol)
tmp.replace(p)
def from_file(path: str | Path) -> Any:
"""
Deserialize obj from a file.
WARNING: Only call with files from trusted sources.
Example:
model = from_file("checkpoint.pkl")
"""
with open(path, "rb") as f:
return pickle.load(f)
def size_bytes(obj: Any) -> int:
"""Return the serialized size of obj in bytes (useful for capacity planning)."""
return len(pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL))
def fast_deepcopy(obj: T) -> T:
"""
Fast deep copy via pickle round-trip. May be faster than copy.deepcopy
for large plain-data structures. Only works for picklable objects.
Example:
snapshot = fast_deepcopy(large_config_dict)
"""
return pickle.loads(pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL))
# ─────────────────────────────────────────────────────────────────────────────
# 2. Signed pickle for tamper-evident storage
# ─────────────────────────────────────────────────────────────────────────────
_HMAC_DIGEST = "sha256"
_HMAC_LEN = 32 # SHA-256 produces 32 bytes
def sign_pickle(obj: Any, secret: bytes) -> bytes:
"""
Serialize obj and prepend an HMAC-SHA256 signature.
The result is safe to store or transmit; verify before loading.
Example:
payload = sign_pickle(session_data, app_secret_key)
store_in_cache(payload)
"""
data = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
sig = hmac.new(secret, data, _HMAC_DIGEST).digest()
return sig + data # 32-byte prefix
def verify_and_load(signed: bytes, secret: bytes) -> Any:
"""
Verify the HMAC-SHA256 signature and deserialize.
Raises ValueError on bad signature; raises pickle.UnpicklingError on corrupt data.
Example:
session = verify_and_load(cache_bytes, app_secret_key)
"""
if len(signed) < _HMAC_LEN:
raise ValueError("Payload too short to contain valid signature")
sig = signed[:_HMAC_LEN]
data = signed[_HMAC_LEN:]
expected = hmac.new(secret, data, _HMAC_DIGEST).digest()
if not hmac.compare_digest(sig, expected):
raise ValueError("HMAC signature mismatch — data may be tampered or corrupt")
return pickle.loads(data)
# ─────────────────────────────────────────────────────────────────────────────
# 3. In-memory stream pickle
# ─────────────────────────────────────────────────────────────────────────────
def to_stream(obj: Any) -> io.BytesIO:
"""
Serialize obj into an in-memory BytesIO stream (rewound to start).
Example:
stream = to_stream(large_dataset)
send_over_socket(stream.read())
"""
buf = io.BytesIO()
pickle.dump(obj, buf, protocol=pickle.HIGHEST_PROTOCOL)
buf.seek(0)
return buf
def from_stream(buf: io.BytesIO) -> Any:
"""
Deserialize the next pickled object from a BytesIO stream.
Example:
dataset = from_stream(received_stream)
"""
return pickle.load(buf)
# ─────────────────────────────────────────────────────────────────────────────
# 4. Custom __getstate__ / __setstate__ patterns
# ─────────────────────────────────────────────────────────────────────────────
class PicklableResource:
"""
Example class that excludes a non-picklable attribute (like a file handle
or lock) from pickling by using __getstate__ / __setstate__.
Example:
r = PicklableResource("data.txt")
r.read() # opens file handle lazily
payload = to_bytes(r) # file handle excluded
r2 = from_bytes(payload)
r2.read() # handle re-opened on access
"""
def __init__(self, path: str) -> None:
self.path = path
self._handle = None # not picklable when open
@property
def handle(self):
if self._handle is None:
self._handle = open(self.path, "rb")
return self._handle
def read(self, n: int = -1) -> bytes:
return self.handle.read(n)
def __getstate__(self) -> dict:
# Exclude _handle so pickle doesn't try to serialize an open file
state = self.__dict__.copy()
state.pop("_handle", None)
return state
def __setstate__(self, state: dict) -> None:
self.__dict__.update(state)
self._handle = None # re-initialize as closed/lazy
@dataclass
class Checkpoint:
"""
A dataclass checkpoint with serialization metadata.
Save/load ML training state, pipeline state, or any resumable process.
Example:
ckpt = Checkpoint.save(epoch=5, model_params=params, loss=0.34)
ckpt.to_file("ckpt_epoch5.pkl")
loaded = Checkpoint.from_file("ckpt_epoch5.pkl")
print(loaded.epoch, loaded.loss)
"""
epoch: int
payload: Any
loss: float = 0.0
saved_at: float = field(default_factory=time.time)
extra: dict = field(default_factory=dict)
@classmethod
def save(cls, epoch: int, model_params: Any, loss: float = 0.0, **extra: Any) -> "Checkpoint":
return cls(epoch=epoch, payload=model_params, loss=loss, extra=extra)
def to_file(self, path: str | Path) -> None:
to_file(self, path)
@classmethod
def from_file(cls, path: str | Path) -> "Checkpoint":
obj = from_file(path)
if not isinstance(obj, cls):
raise TypeError(f"Expected Checkpoint, got {type(obj).__name__}")
return obj
def to_bytes(self) -> bytes:
return to_bytes(self)
@classmethod
def from_bytes(cls, data: bytes) -> "Checkpoint":
return from_bytes(data)
# ─────────────────────────────────────────────────────────────────────────────
# 5. File-based pickle cache
# ─────────────────────────────────────────────────────────────────────────────
class PickleCache:
"""
Simple file-based cache using pickle with optional TTL.
Example:
cache = PickleCache(directory=".cache")
cache.set("expensive_result", compute(), ttl=3600)
result = cache.get("expensive_result") # None if expired or missing
"""
def __init__(self, directory: str | Path = ".pickle_cache") -> None:
self._dir = Path(directory)
self._dir.mkdir(parents=True, exist_ok=True)
def _path(self, key: str) -> Path:
safe = key.replace("/", "_").replace(":", "_")
return self._dir / f"{safe}.pkl"
def set(self, key: str, value: Any, ttl: float | None = None) -> None:
"""Store value under key. ttl is seconds until expiry (None = forever)."""
entry = {
"value": value,
"expires": time.time() + ttl if ttl is not None else None,
}
to_file(entry, self._path(key))
def get(self, key: str, default: Any = None) -> Any:
"""Return cached value or default if missing/expired."""
path = self._path(key)
if not path.exists():
return default
try:
entry = from_file(path)
if entry["expires"] is not None and time.time() > entry["expires"]:
path.unlink(missing_ok=True)
return default
return entry["value"]
except (pickle.UnpicklingError, KeyError, EOFError):
return default
def delete(self, key: str) -> bool:
path = self._path(key)
if path.exists():
path.unlink()
return True
return False
def clear(self) -> int:
count = 0
for p in self._dir.glob("*.pkl"):
p.unlink()
count += 1
return count
def keys(self) -> list[str]:
return [p.stem for p in self._dir.glob("*.pkl")]
def __len__(self) -> int:
return len(list(self._dir.glob("*.pkl")))
# ─────────────────────────────────────────────────────────────────────────────
# Demo
# ─────────────────────────────────────────────────────────────────────────────
if __name__ == "__main__":
import tempfile
print("=== pickle demo ===")
data = {"users": [{"id": i, "name": f"User{i}", "scores": list(range(i, i+5))} for i in range(1, 6)]}
print("\n--- to_bytes / from_bytes ---")
raw = to_bytes(data)
print(f" serialized size: {len(raw)} bytes")
restored = from_bytes(raw)
print(f" roundtrip ok: {restored == data}")
print(f" size_bytes: {size_bytes(data)} bytes")
print("\n--- fast_deepcopy ---")
import copy
import time as _time
t0 = _time.perf_counter()
for _ in range(1000): copy.deepcopy(data)
t_deep = _time.perf_counter() - t0
t0 = _time.perf_counter()
for _ in range(1000): fast_deepcopy(data)
t_pickle = _time.perf_counter() - t0
print(f" deepcopy 1k: {t_deep*1000:.1f}ms pickle 1k: {t_pickle*1000:.1f}ms")
print("\n--- to_file / from_file ---")
with tempfile.TemporaryDirectory() as td:
path = os.path.join(td, "data.pkl")
to_file(data, path)
loaded = from_file(path)
print(f" from_file roundtrip ok: {loaded == data}")
print("\n--- sign_pickle / verify_and_load ---")
secret = b"super-secret-key-1234"
signed = sign_pickle(data, secret)
print(f" signed size: {len(signed)} bytes (32 extra for HMAC)")
verified = verify_and_load(signed, secret)
print(f" verify ok: {verified == data}")
try:
verify_and_load(b"\x00" * len(signed), secret)
except ValueError as e:
print(f" tampered: {e}")
print("\n--- to_stream / from_stream ---")
buf = to_stream(data)
loaded_stream = from_stream(buf)
print(f" stream roundtrip ok: {loaded_stream == data}")
print("\n--- PicklableResource ---")
with tempfile.NamedTemporaryFile(delete=False, suffix=".bin") as tmp:
tmp.write(b"hello pickle world")
tmp_path = tmp.name
res = PicklableResource(tmp_path)
raw_res = to_bytes(res) # handle excluded
res2 = from_bytes(raw_res)
content = res2.read()
print(f" after pickle roundtrip, read: {content!r}")
os.unlink(tmp_path)
print("\n--- Checkpoint ---")
params = {"W1": [0.1, 0.2], "b1": [0.0]}
ckpt = Checkpoint.save(epoch=3, model_params=params, loss=0.0523)
raw_ckpt = ckpt.to_bytes()
loaded_ckpt = Checkpoint.from_bytes(raw_ckpt)
print(f" epoch={loaded_ckpt.epoch} loss={loaded_ckpt.loss} params={loaded_ckpt.payload}")
print("\n--- PickleCache ---")
with tempfile.TemporaryDirectory() as td:
cache = PickleCache(directory=td)
cache.set("result", {"computed": True, "value": 42}, ttl=10)
cache.set("perm", "permanent value")
print(f" get result: {cache.get('result')}")
print(f" keys: {sorted(cache.keys())}")
print(f" len: {len(cache)}")
print(f" missing: {cache.get('nope', 'default')!r}")
cache.delete("perm")
print(f" after delete perm: {cache.get('perm', 'gone')!r}")
print("\n=== done ===")
For the json alternative — json.dumps/json.loads produces human-readable, language-portable text and is safe to load from external sources; it only handles Python’s basic types (dict, list, str, int, float, bool, None) and requires default= hooks for custom types; pickle handles arbitrary Python objects (dataclasses, sets, custom classes, numpy arrays, datetime) without extra hooks but produces non-portable binary and must never be loaded from untrusted input — use JSON for APIs, configuration files, cross-language data exchange, and any externally stored data; use pickle for internal Python-to-Python checkpoints, IPC payloads between trusted processes, and caching expensive-to-reconstruct Python objects. For the msgpack / orjson alternative — msgpack (PyPI) serializes data as compact binary with cross-language support and better performance than JSON for binary-heavy payloads; orjson (PyPI) is 10–20× faster than stdlib json and supports datetime, numpy, UUID natively; both are safer than pickle for untrusted data — use msgpack or orjson when you need fast, compact, safe serialization of JSON-compatible data that may cross language boundaries or be received from external services; use pickle only within a trusted Python process boundary for objects that are not JSON-compatible. The Claude Skills 360 bundle includes pickle skill sets covering to_bytes()/from_bytes()/to_file()/from_file()/size_bytes()/fast_deepcopy() core helpers, sign_pickle()/verify_and_load() HMAC-signed tamper-evident serialization, to_stream()/from_stream() in-memory BytesIO round-trip, PicklableResource with getstate/setstate, Checkpoint dataclass for ML/pipeline checkpoints, and PickleCache file-based key-value cache with TTL. Start with the free tier to try object persistence patterns and pickle pipeline code generation.