Ray turns Python into a distributed computing framework without requiring you to think about clusters. A regular Python function becomes a parallel task with @ray.remote. Class instances become stateful distributed actors. ML training scales to hundreds of GPUs with Ray Train. Hyperparameter search parallelizes across dozens of trials with Ray Tune. Claude Code writes Ray task graphs, actor designs, distributed training configs, and Ray Serve deployments that run the same way locally and on a 100-node cluster.

CLAUDE.md for Ray Projects

## Ray Stack
- Ray 2.x (latest stable) — pip install "ray[all]"
- Cluster: local for dev, Ray on Kubernetes (KubeRay) for production
- Object store: Ray's shared memory for large objects — never pass huge arrays by value
- Fault tolerance: max_retries=3 for tasks, Checkpointing for train/tune
- Resources: annotate tasks with @ray.remote(num_cpus=2, num_gpus=1)
- Ray Data: streaming reads from S3/GCS — avoid loading full datasets into driver
- Tune: ASHA scheduler for early stopping; never run without a scheduler

Ray Core: Tasks and Actors

# ray_core/tasks.py — parallel task execution
import ray
import time

ray.init()  # Local cluster; ray.init("ray://head-node:10001") for remote

# @ray.remote makes any function a parallel, distributed task
@ray.remote
def process_document(doc_id: str, content: str) -> dict:
    """Process a single document. Runs in parallel worker."""
    # CPU-intensive work: tokenization, feature extraction, etc.
    words = content.lower().split()
    word_count = len(words)
    unique_words = len(set(words))
    
    return {
        "doc_id": doc_id,
        "word_count": word_count,
        "unique_words": unique_words,
        "lexical_diversity": unique_words / max(word_count, 1),
    }

# Fan-out: submit all tasks, collect results concurrently
def process_all_documents(documents: list[dict]) -> list[dict]:
    # Submit tasks — non-blocking, returns ObjectRefs
    futures = [
        process_document.remote(doc["id"], doc["content"])
        for doc in documents
    ]
    
    # ray.get() blocks until all complete
    results = ray.get(futures)
    return results

# Resource-annotated tasks
@ray.remote(num_gpus=1, num_cpus=4)
def run_inference(model_ref, batch: list[str]) -> list[float]:
    """Runs on a GPU worker. model_ref is a Ray object ref."""
    model = ray.get(model_ref)
    return model.predict(batch)

# Stateful Actor: maintains state across calls (like a server)
@ray.remote
class OrderProcessor:
    def __init__(self):
        self.processed_count = 0
        self.failed_count = 0
    
    def process(self, order: dict) -> dict:
        try:
            result = validate_and_process(order)
            self.processed_count += 1
            return {"success": True, "result": result}
        except Exception as e:
            self.failed_count += 1
            return {"success": False, "error": str(e)}
    
    def stats(self) -> dict:
        return {
            "processed": self.processed_count,
            "failed": self.failed_count,
        }

# Actor pool: N actors handle work in parallel
processors = [OrderProcessor.remote() for _ in range(4)]

# Round-robin dispatch to actors
for i, order in enumerate(orders):
    actor = processors[i % len(processors)]
    future = actor.process.remote(order)

Ray Data: Distributed ETL

# ray_data/pipeline.py — process large datasets in parallel
import ray
from ray.data import read_parquet, from_items

# Stream large S3 dataset — never loads full dataset into memory
ds = ray.data.read_parquet(
    "s3://my-bucket/training-data/",
    columns=["text", "label"],
)

# Parallel map — runs on all rows across cluster
def preprocess(batch: dict) -> dict:
    """Applied to micro-batches in parallel."""
    # batch is a dict of numpy arrays / lists
    texts = batch["text"]
    processed = [text.lower().strip()[:512] for text in texts]
    return {"text": processed, "label": batch["label"]}

processed_ds = (
    ds
    .map_batches(preprocess, batch_size=256)
    .filter(lambda row: len(row["text"]) > 10)
)

# Materialize to Parquet (streaming write — no OOM)
processed_ds.write_parquet("s3://my-bucket/processed/")

# Aggregate statistics across dataset
stats = (
    processed_ds
    .map_batches(lambda b: {"length": [len(t) for t in b["text"]]})
    .mean("length")
)

Ray Tune: Hyperparameter Search

# ray_tune/search.py — parallel hyperparameter optimization
import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
from ray.tune.search.optuna import OptunaSearch

def train_model(config: dict):
    """Training function — called once per trial."""
    learning_rate = config["learning_rate"]
    batch_size = config["batch_size"]
    hidden_size = config["hidden_size"]
    
    model = build_model(hidden_size)
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    
    for epoch in range(50):
        train_loss = train_epoch(model, optimizer, batch_size)
        val_loss, val_acc = evaluate(model)
        
        # Report metrics to Tune — enables early stopping
        tune.report(
            epoch=epoch,
            train_loss=train_loss,
            val_loss=val_loss,
            val_accuracy=val_acc,
        )

# ASHA: early stop bad trials, keep promising ones
scheduler = ASHAScheduler(
    max_t=50,           # Max epochs per trial
    grace_period=5,     # Min epochs before stopping
    reduction_factor=3, # Keep top 1/3 at each rung
)

# Optuna for smart search (better than random)
search_alg = OptunaSearch(metric="val_accuracy", mode="max")

analysis = tune.run(
    train_model,
    config={
        "learning_rate": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.choice([32, 64, 128, 256]),
        "hidden_size": tune.choice([128, 256, 512]),
    },
    num_samples=50,          # 50 parallel trials
    scheduler=scheduler,
    search_alg=search_alg,
    resources_per_trial={"cpu": 2, "gpu": 0.5},
    metric="val_accuracy",
    mode="max",
    storage_path="s3://my-bucket/tune-results/",
)

best_config = analysis.best_config
print(f"Best config: {best_config}")
print(f"Best val accuracy: {analysis.best_result['val_accuracy']:.4f}")

Ray Serve: Model Serving

# ray_serve/app.py — scalable model serving with batching
import ray
from ray import serve
from starlette.requests import Request
from starlette.responses import JSONResponse

@serve.deployment(
    num_replicas=2,
    ray_actor_options={"num_gpus": 1},
    max_concurrent_queries=10,
)
class TextClassifier:
    def __init__(self, model_path: str):
        # Load model once per replica
        self.model = load_model(model_path)
        self.tokenizer = load_tokenizer(model_path)
    
    async def __call__(self, request: Request) -> JSONResponse:
        body = await request.json()
        texts = body["texts"]
        
        inputs = self.tokenizer(texts, return_tensors="pt", padding=True, truncation=True)
        outputs = self.model(**inputs)
        predictions = outputs.logits.argmax(dim=-1).tolist()
        
        return JSONResponse({"predictions": predictions})

# Deploy the application
app = TextClassifier.bind("/models/classifier-v2")

# Run locally: serve run ray_serve.app:app
# Deploy to cluster: serve deploy ray_serve.app

For the MLOps experiment tracking that records Ray Tune results and model metrics, the MLOps guide covers MLflow integration with hyperparameter logging. For the Kubernetes deployment of Ray clusters with KubeRay, the Kubernetes guide covers operator patterns and GPU node configuration. The Claude Skills 360 bundle includes Ray skill sets covering distributed task graphs, Ray Tune hyperparameter search, and Ray Serve deployment patterns. Start with the free tier to try Ray task graph generation.

Claude Code for Ray: Distributed Python, Parallel Training, and Hyperparameter Tuning

CLAUDE.md for Ray Projects

Ray Core: Tasks and Actors

Ray Data: Distributed ETL

Ray Tune: Hyperparameter Search

Ray Serve: Model Serving

Keep Reading

Claude Code for AWS Bedrock: Building AI Applications on Managed Infrastructure

Claude Code for PyTorch: Model Training, Custom Datasets, and Production Deployment

Claude Code for MCP: Building Model Context Protocol Servers

Put these ideas into practice