Blog / AI / Claude Code for Weights & Biases: Experiment Tracking and Hyperparameter Sweeps

Claude Code for Weights & Biases: Experiment Tracking and Hyperparameter Sweeps

Published: January 5, 2027

•

Read time: 9 min read

•

By: Claude Skills 360

Weights & Biases tracks every experiment run — hyperparameters, metrics, system stats, model weights, and prediction tables — with three lines of code. wandb.init(config=config) starts a run. wandb.log({"loss": loss}) instruments the training loop. wandb.Artifact versions datasets and models. Sweep agents explore hyperparameter spaces with Bayesian optimization, running parallel trials across machines. W&B integrates into PyTorch, Hugging Face Trainer, and PyTorch Lightning with one-line callbacks. Claude Code generates W&B instrumented training scripts, sweep configurations, artifact pipelines, and the dashboard queries for production ML experiment management.

CLAUDE.md for W&B Projects

## Weights & Biases Stack
- wandb >= 0.17, initialized with wandb.init(project=PROJECT, entity=ENTITY)
- Config: pass hyperparams via config= dict — never hardcode in training loop
- Logging: wandb.log(metrics, step=epoch) — always include step for time-series plots
- Artifacts: log datasets as type="dataset", models as type="model"
- Sweeps: YAML sweep config + wandb agent — Bayesian for <15 params, grid for <5
- Integrations: WandbCallback for HF Trainer, WandbLogger for Lightning
- Team: set WANDB_ENTITY in env — never hardcode in scripts

Basic Experiment Tracking

# training/train_with_wandb.py — instrument a PyTorch training loop
import wandb
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import os


def train_model(config: dict):
    """Training loop fully instrumented with W&B."""

    # Initialize run — config is logged and appears in the W&B UI
    run = wandb.init(
        project=os.environ.get("WANDB_PROJECT", "my-project"),
        entity=os.environ.get("WANDB_ENTITY"),
        config=config,
        tags=["baseline", config.get("model_type", "unknown")],
    )

    # Access config through wandb.config (supports sweep overrides)
    cfg = wandb.config

    # Build model and optimizer
    model = build_model(cfg.hidden_size, cfg.num_layers, cfg.dropout)
    optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=cfg.learning_rate,
        weight_decay=cfg.weight_decay,
    )
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, T_max=cfg.num_epochs
    )

    # Watch the model: log gradients and parameter histograms
    wandb.watch(model, log="gradients", log_freq=100)

    train_loader, val_loader = get_dataloaders(cfg.batch_size)
    criterion = nn.CrossEntropyLoss()

    best_val_acc = 0.0

    for epoch in range(cfg.num_epochs):
        # Training phase
        model.train()
        train_losses, train_correct = [], 0

        for batch_idx, (inputs, labels) in enumerate(train_loader):
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()

            # Gradient clipping
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()

            train_losses.append(loss.item())
            train_correct += (outputs.argmax(1) == labels).sum().item()

            # Log batch-level metrics every N steps
            if batch_idx % 50 == 0:
                wandb.log({
                    "batch/loss": loss.item(),
                    "batch/lr": scheduler.get_last_lr()[0],
                }, step=epoch * len(train_loader) + batch_idx)

        scheduler.step()

        # Validation phase
        val_loss, val_acc = evaluate(model, val_loader, criterion)

        # Log epoch-level metrics
        epoch_metrics = {
            "epoch": epoch,
            "train/loss": sum(train_losses) / len(train_losses),
            "train/accuracy": train_correct / len(train_loader.dataset),
            "val/loss": val_loss,
            "val/accuracy": val_acc,
            "lr": scheduler.get_last_lr()[0],
        }
        wandb.log(epoch_metrics, step=epoch)

        # Save best model as W&B artifact
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            save_and_log_model(model, run, epoch, val_acc)

    # Summary metrics — shown in the run overview table
    wandb.summary["best_val_accuracy"] = best_val_acc
    wandb.summary["total_epochs"] = cfg.num_epochs

    run.finish()
    return best_val_acc


def save_and_log_model(model, run, epoch: int, val_acc: float):
    """Save model checkpoint as a W&B artifact."""
    checkpoint_path = f"checkpoint_epoch{epoch}_acc{val_acc:.3f}.pt"
    torch.save(model.state_dict(), checkpoint_path)

    artifact = wandb.Artifact(
        name="classifier-checkpoint",
        type="model",
        metadata={"epoch": epoch, "val_accuracy": val_acc},
    )
    artifact.add_file(checkpoint_path)
    run.log_artifact(artifact)

Dataset Versioning with Artifacts

# artifacts/dataset_artifact.py — version datasets with W&B Artifacts
import wandb
import pandas as pd
from pathlib import Path


def log_dataset_artifact(
    data_dir: str,
    artifact_name: str,
    metadata: dict,
    project: str,
) -> str:
    """Upload a dataset directory as a versioned W&B artifact."""

    run = wandb.init(project=project, job_type="data-prep")

    artifact = wandb.Artifact(
        name=artifact_name,
        type="dataset",
        description=metadata.get("description", ""),
        metadata=metadata,
    )

    # Add entire directory — files are deduplicated by content hash
    artifact.add_dir(data_dir)

    # Log a data preview as a W&B Table
    train_df = pd.read_csv(f"{data_dir}/train.csv").head(100)
    table = wandb.Table(dataframe=train_df)
    artifact.add(table, "sample_preview")

    run.log_artifact(artifact)
    artifact_url = artifact.wait()  # Block until upload completes

    run.finish()
    print(f"Dataset artifact: {artifact_url}")
    return str(artifact_url)


def download_dataset_artifact(
    artifact_path: str,  # "entity/project/artifact_name:version"
    download_dir: str = "./data",
) -> str:
    """Download a specific version of a dataset artifact."""

    run = wandb.init(job_type="training")

    artifact = run.use_artifact(artifact_path, type="dataset")
    local_path = artifact.download(root=download_dir)

    print(f"Dataset downloaded to: {local_path}")
    print(f"Artifact metadata: {artifact.metadata}")

    run.finish()
    return local_path


def log_predictions_table(
    texts: list[str],
    true_labels: list[str],
    pred_labels: list[str],
    pred_scores: list[float],
    run=None,
) -> None:
    """Log a prediction analysis table to W&B."""

    columns = ["text", "true_label", "predicted_label", "score", "correct"]
    data = [
        [text, true, pred, score, true == pred]
        for text, true, pred, score in zip(texts, true_labels, pred_labels, pred_scores)
    ]

    table = wandb.Table(columns=columns, data=data)

    if run:
        run.log({"predictions": table})
    else:
        wandb.log({"predictions": table})

Hyperparameter Sweeps

# sweeps/run_sweep.py — Bayesian hyperparameter optimization
import wandb
import os

# Sweep configuration — defines the search space
SWEEP_CONFIG = {
    "name": "transformer-classifier-sweep",
    "method": "bayes",          # bayes, random, or grid
    "metric": {
        "name": "val/f1_macro",
        "goal": "maximize",
    },
    "parameters": {
        "learning_rate": {
            "distribution": "log_uniform_values",
            "min": 1e-5,
            "max": 1e-3,
        },
        "batch_size": {
            "values": [16, 32, 64],
        },
        "warmup_ratio": {
            "distribution": "uniform",
            "min": 0.0,
            "max": 0.15,
        },
        "weight_decay": {
            "distribution": "log_uniform_values",
            "min": 1e-4,
            "max": 1e-1,
        },
        "lora_r": {
            "values": [8, 16, 32, 64],
        },
        "lora_alpha_ratio": {
            "values": [1, 2, 4],   # lora_alpha = lora_r * ratio
        },
        "num_epochs": {
            "values": [3, 5, 8],
        },
    },
    "early_terminate": {
        "type": "hyperband",
        "min_iter": 2,
        "eta": 2,
    },
}


def train_sweep():
    """Single sweep trial — called by each wandb agent."""

    # wandb.init() with no config — sweep agent injects config
    run = wandb.init()
    cfg = wandb.config

    # Derived config values
    lora_alpha = cfg.lora_r * cfg.lora_alpha_ratio

    # Import here to avoid GPU memory before sweep agent forks
    from training.lora_trainer import train as lora_train

    metrics = lora_train(
        learning_rate=cfg.learning_rate,
        batch_size=cfg.batch_size,
        warmup_ratio=cfg.warmup_ratio,
        weight_decay=cfg.weight_decay,
        lora_r=cfg.lora_r,
        lora_alpha=lora_alpha,
        num_epochs=cfg.num_epochs,
    )

    wandb.log(metrics)
    run.finish()


def launch_sweep(project: str, count: int = 30) -> str:
    """Create a sweep and launch agents."""

    sweep_id = wandb.sweep(
        sweep=SWEEP_CONFIG,
        project=project,
        entity=os.environ.get("WANDB_ENTITY"),
    )

    print(f"Sweep created: {sweep_id}")
    print(f"View at: https://wandb.ai/{os.environ.get('WANDB_ENTITY')}/{project}/sweeps/{sweep_id}")

    # Run agent locally — launches `count` trials sequentially
    # For parallel: run this function on multiple machines with the same sweep_id
    wandb.agent(sweep_id, function=train_sweep, count=count)

    return sweep_id

Hugging Face Trainer Integration

# integrations/hf_trainer_wandb.py — W&B + Hugging Face Trainer
import os
from transformers import TrainingArguments, Trainer
import wandb


def build_training_args_with_wandb(
    output_dir: str,
    run_name: str,
    config: dict,
) -> TrainingArguments:
    """TrainingArguments configured to log to W&B."""

    # Trainer reads WANDB_PROJECT from environment
    os.environ["WANDB_PROJECT"] = os.environ.get("WANDB_PROJECT", "hf-experiments")

    return TrainingArguments(
        output_dir=output_dir,
        run_name=run_name,
        report_to="wandb",       # Enable W&B logging
        num_train_epochs=config["num_epochs"],
        per_device_train_batch_size=config["batch_size"],
        learning_rate=config["learning_rate"],
        warmup_ratio=config.get("warmup_ratio", 0.06),
        weight_decay=config.get("weight_decay", 0.01),
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="f1_macro",
        fp16=True,
        logging_steps=25,
        # W&B-specific: log model checkpoints as artifacts
        push_to_hub=False,
    )


def log_evaluation_report(trainer: Trainer, eval_dataset, label_names: list[str]):
    """Log detailed evaluation metrics and confusion matrix to W&B."""
    import numpy as np
    from sklearn.metrics import classification_report, confusion_matrix

    predictions = trainer.predict(eval_dataset)
    pred_labels = np.argmax(predictions.predictions, axis=-1)
    true_labels = predictions.label_ids

    report = classification_report(
        true_labels, pred_labels,
        target_names=label_names,
        output_dict=True,
    )

    # Log per-class metrics
    for label, metrics in report.items():
        if isinstance(metrics, dict):
            for metric_name, value in metrics.items():
                wandb.log({f"eval/{label}/{metric_name}": value})

    # Log confusion matrix as W&B plot
    cm = confusion_matrix(true_labels, pred_labels)
    wandb.log({
        "eval/confusion_matrix": wandb.plot.confusion_matrix(
            y_true=true_labels.tolist(),
            preds=pred_labels.tolist(),
            class_names=label_names,
        )
    })

For the MLflow alternative that stores experiment metadata locally or on your own server without a SaaS dependency, see the MLflow guide for experiment tracking and model registry. For the Hugging Face Trainer that integrates directly with W&B via report_to="wandb", the Transformers guide covers LoRA fine-tuning and TrainingArguments. The Claude Skills 360 bundle includes W&B skill sets covering experiment instrumentation, sweep configuration, and artifact pipelines. Start with the free tier to try W&B training script generation.

Keep Reading

Claude Code for email.contentmanager: Python Email Content Accessors

Read and write EmailMessage body content with Python's email.contentmanager module and Claude Code — email contentmanager ContentManager for the class that maps content types to get and set handler functions allowing EmailMessage to support get_content and set_content with type-specific behaviour, email contentmanager raw_data_manager for the ContentManager instance that handles raw bytes and str payloads without any conversion, email contentmanager content_manager for the standard ContentManager instance used by email.policy.default that intelligently handles text plain text html multipart and binary content types, email contentmanager get_content_text for the handler that returns the decoded text payload of a text-star message part as a str, email contentmanager get_content_binary for the handler that returns the raw decoded bytes payload of a non-text message part, email contentmanager get_data_manager for the get-handler lookup used by EmailMessage get_content to find the right reader function for the content type, email contentmanager set_content text for the handler that creates and sets a text part correctly choosing charset and transfer encoding, email contentmanager set_content bytes for the handler that creates and sets a binary part with base64 encoding and optional filename Content-Disposition, email contentmanager EmailMessage get_content for the method that reads the message body using the registered content manager handlers, email contentmanager EmailMessage set_content for the method that sets the message body and MIME headers in one call, email contentmanager EmailMessage make_alternative make_mixed make_related for the methods that convert a simple message into a multipart container, email contentmanager EmailMessage add_attachment for the method that attaches a file or bytes to a multipart message, and email contentmanager integration with email.message and email.policy and email.mime and io for building high-level email readers attachment extractors text body accessors HTML readers and policy-aware MIME construction pipelines.

5 min read Feb 12, 2029

Claude Code for email.charset: Python Email Charset Encoding

Control header and body encoding for international email with Python's email.charset module and Claude Code — email charset Charset for the class that wraps a character set name with the encoding rules for header encoding and body encoding describing how to encode text for that charset in email messages, email charset Charset header_encoding for the attribute specifying whether headers using this charset should use QP quoted-printable encoding BASE64 encoding or no encoding, email charset Charset body_encoding for the attribute specifying the Content-Transfer-Encoding to use for message bodies in this charset such as QP or BASE64, email charset Charset output_codec for the attribute giving the Python codec name used to encode the string to bytes for the wire format, email charset Charset input_codec for the attribute giving the Python codec name used to decode incoming bytes to str, email charset Charset get_output_charset for returning the output charset name, email charset Charset header_encode for encoding a header string using the charset's header_encoding method, email charset Charset body_encode for encoding body content using the charset's body_encoding, email charset Charset convert for converting a string from the input_codec to the output_codec, email charset add_charset for registering a new charset with custom encoding rules in the global charset registry, email charset add_alias for adding an alias name that maps to an existing registered charset, email charset add_codec for registering a codec name mapping for use by the charset machinery, and email charset integration with email.message and email.mime and email.policy and email.encoders for building international email senders non-ASCII header encoders Content-Transfer-Encoding selectors charset-aware message constructors and MIME encoding pipelines.

5 min read Feb 11, 2029

Claude Code for email.utils: Python Email Address and Header Utilities

Parse and format RFC 2822 email addresses and dates with Python's email.utils module and Claude Code — email utils parseaddr for splitting a display-name plus angle-bracket address string into a realname and email address tuple, email utils formataddr for combining a realname and address string into a properly quoted RFC 2822 address with angle brackets, email utils getaddresses for parsing a list of raw address header strings each potentially containing multiple comma-separated addresses into a list of realname address tuples, email utils parsedate for parsing an RFC 2822 date string into a nine-tuple compatible with time.mktime, email utils parsedate_tz for parsing an RFC 2822 date string into a ten-tuple that includes the UTC offset timezone in seconds, email utils parsedate_to_datetime for parsing an RFC 2822 date string into an aware datetime object with timezone, email utils formatdate for formatting a POSIX timestamp or the current time as an RFC 2822 date string with optional usegmt and localtime flags, email utils format_datetime for formatting a datetime object as an RFC 2822 date string, email utils make_msgid for generating a globally unique Message-ID string with optional idstring and domain components, email utils decode_rfc2231 for decoding an RFC 2231 encoded parameter value into a tuple of charset language and value, email utils encode_rfc2231 for encoding a string as an RFC 2231 encoded parameter value, email utils collapse_rfc2231_value for collapsing a decoded RFC 2231 tuple to a Unicode string, and email utils integration with email.message and email.headerregistry and datetime and time for building address parsers date formatters message-id generators header extractors and RFC-compliant email construction utilities.

5 min read Feb 10, 2029

Put these ideas into practice

Claude Skills 360 gives you production-ready skills for everything in this article — and 2,350+ more. Start free or go all-in.

Get 360 skills free

Free $39