Ultralytics YOLO is the fastest object detection and segmentation framework. pip install ultralytics. from ultralytics import YOLO. Load: model = YOLO("yolo11n.pt") — sizes: n (nano), s (small), m (medium), l, x. Task-specific: yolo11n-seg.pt (segmentation), yolo11n-pose.pt (pose), yolo11n-obb.pt (oriented boxes). Detect: results = model("image.jpg", conf=0.45, iou=0.5). Batch: results = model(["img1.jpg","img2.jpg","img3.jpg"]). Video: results = model("video.mp4", stream=True), for r in results: r.save(filename="pred.jpg"). Access boxes: results[0].boxes.xyxy, .conf, .cls, .xywh. Class names: results[0].names. Plot: results[0].plot() — annotated NumPy array. Save: results[0].save(filename="out.jpg"). Train: model.train(data="coco.yaml", epochs=100, imgsz=640, batch=16, device=0). Resume: model.train(resume=True). Val: model.val() returns metrics dict with box.map50. Export: model.export(format="onnx") or "tensorrt", "coreml", "tflite". Track: results = model.track("video.mp4", persist=True, tracker="bytetrack.yaml"), results[0].boxes.id — track IDs. Segmentation: results[0].masks.xy — polygon masks. Pose: results[0].keypoints.xy — (N, 17, 2) for COCO keypoints. Custom dataset YAML: path: ./dataset, train: images/train, val: images/val, nc: 2, names: [cat, dog]. from ultralytics import solutions — speed estimation, queue management, distance calculation solutions. Claude Code generates YOLO inference scripts, training configs, custom dataset pipelines, video trackers, and export workflows.
CLAUDE.md for Ultralytics YOLO
## Ultralytics Stack
- Version: ultralytics >= 8.3
- Load: YOLO("yolo11n.pt" | "yolo11n-seg.pt" | "yolo11n-pose.pt" | custom.pt)
- Detect: model.predict(source, conf=0.45, iou=0.5, device=0, stream=True)
- Results: .boxes.xyxy | .boxes.conf | .boxes.cls | .masks.xy | .keypoints.xy
- Train: model.train(data="dataset.yaml", epochs, imgsz=640, batch)
- Export: model.export(format="onnx" | "tensorrt" | "coreml" | "tflite")
- Track: model.track(source, persist=True, tracker="bytetrack.yaml")
- Val: model.val() → metrics.box.map50 ([email protected]), map ([email protected]:.95)
- Source: path | URL | np.ndarray | torch.Tensor | list | generator
Ultralytics YOLO Pipeline
# vision/ultralytics_pipeline.py — object detection and tracking with YOLO
from __future__ import annotations
import os
import cv2
import numpy as np
from pathlib import Path
from typing import Generator
from ultralytics import YOLO
# ── 1. Model loading ──────────────────────────────────────────────────────────
def load_detector(model_size: str = "n", pretrained: bool = True) -> YOLO:
"""
Load YOLO detection model.
Sizes: n (3.2M) | s (11M) | m (25M) | l (43M) | x (68M)
"""
model_name = f"yolo11{model_size}.pt" if pretrained else f"yolo11{model_size}.yaml"
model = YOLO(model_name)
print(f"Loaded YOLO11{model_size.upper()}: {model.info()[0]} params")
return model
def load_segmenter(model_size: str = "n") -> YOLO:
"""Load YOLO instance segmentation model."""
return YOLO(f"yolo11{model_size}-seg.pt")
def load_pose_model(model_size: str = "n") -> YOLO:
"""Load YOLO pose estimation model (17 COCO keypoints)."""
return YOLO(f"yolo11{model_size}-pose.pt")
def load_custom_model(weights_path: str) -> YOLO:
"""Load a custom or fine-tuned YOLO model."""
return YOLO(weights_path)
# ── 2. Image inference ────────────────────────────────────────────────────────
def detect_objects(
model: YOLO,
source, # str path | np.ndarray | list | URL
conf: float = 0.45,
iou: float = 0.5,
imgsz: int = 640,
classes: list = None, # Filter specific class IDs
agnostic: bool = False, # Class-agnostic NMS
device: str = "cpu",
) -> list[dict]:
"""
Run object detection and return structured results.
Returns list of dicts with boxes, scores, class names.
"""
results = model.predict(
source=source,
conf=conf,
iou=iou,
imgsz=imgsz,
classes=classes,
agnostic_nms=agnostic,
device=device,
verbose=False,
)
detections = []
for r in results:
boxes = r.boxes
img_dets = []
for i in range(len(boxes)):
xyxy = boxes.xyxy[i].tolist()
score = float(boxes.conf[i])
cls = int(boxes.cls[i])
name = r.names[cls]
img_dets.append({
"bbox": [round(c, 1) for c in xyxy], # [x1, y1, x2, y2]
"score": round(score, 3),
"class_id": cls,
"class_name": name,
})
detections.append(img_dets)
return detections if len(detections) > 1 else (detections[0] if detections else [])
def count_objects(
model: YOLO,
source,
classes: list = None,
) -> dict[str, int]:
"""Count detections per class in an image."""
dets = detect_objects(model, source, classes=classes)
if isinstance(dets, dict):
dets = [dets]
counts: dict[str, int] = {}
for det in (dets if isinstance(dets, list) and dets and isinstance(dets[0], list) else [dets]):
for d in det:
counts[d["class_name"]] = counts.get(d["class_name"], 0) + 1
return counts
# ── 3. Instance segmentation ──────────────────────────────────────────────────
def segment_objects(
model: YOLO,
image: np.ndarray, # BGR image (OpenCV format)
conf: float = 0.45,
device: str = "cpu",
) -> tuple[np.ndarray, list[dict]]:
"""
Run instance segmentation.
Returns annotated image and list of dicts with mask polygons.
"""
results = model.predict(image, conf=conf, device=device, verbose=False)
r = results[0]
segments = []
if r.masks is not None:
for i, (mask, box) in enumerate(zip(r.masks.xy, r.boxes)):
segments.append({
"polygon": mask.tolist(), # List of (x, y) points
"bbox": r.boxes.xyxy[i].tolist(),
"score": float(box.conf),
"class_name": r.names[int(box.cls)],
})
annotated = r.plot(boxes=True, masks=True)
return annotated, segments
def create_segmentation_mask(
image_shape: tuple,
polygon: list,
fill: int = 255,
) -> np.ndarray:
"""Create a binary mask from a polygon."""
mask = np.zeros(image_shape[:2], dtype=np.uint8)
pts = np.array(polygon, dtype=np.int32)
cv2.fillPoly(mask, [pts], fill)
return mask
# ── 4. Pose estimation ────────────────────────────────────────────────────────
COCO_KEYPOINTS = [
"nose", "left_eye", "right_eye", "left_ear", "right_ear",
"left_shoulder", "right_shoulder", "left_elbow", "right_elbow",
"left_wrist", "right_wrist", "left_hip", "right_hip",
"left_knee", "right_knee", "left_ankle", "right_ankle",
]
COCO_SKELETON = [
(16, 14), (14, 12), (17, 15), (15, 13), (12, 13),
(6, 12), (7, 13), (6, 7), (6, 8), (7, 9), (8, 10),
(9, 11), (2, 3), (1, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7),
]
def estimate_pose(
model: YOLO,
image: np.ndarray,
conf: float = 0.45,
) -> list[dict]:
"""
Estimate human pose keypoints.
Returns list of people with keypoints and visibility.
"""
results = model.predict(image, conf=conf, verbose=False)
r = results[0]
poses = []
if r.keypoints is not None:
kpts = r.keypoints.xy.cpu().numpy() # (N_people, 17, 2)
confs = r.keypoints.conf.cpu().numpy() if r.keypoints.conf is not None else None
for person_idx in range(len(kpts)):
person_kpts = {}
for kpt_idx, kpt_name in enumerate(COCO_KEYPOINTS):
x, y = kpts[person_idx, kpt_idx]
conf_val = float(confs[person_idx, kpt_idx]) if confs is not None else 1.0
person_kpts[kpt_name] = {"x": float(x), "y": float(y), "conf": conf_val}
poses.append({
"bbox": r.boxes.xyxy[person_idx].tolist() if r.boxes else None,
"keypoints": person_kpts,
})
return poses
# ── 5. Video tracking ─────────────────────────────────────────────────────────
def track_objects_video(
model: YOLO,
video_path: str,
output_path: str = "tracked_output.mp4",
conf: float = 0.45,
tracker: str = "bytetrack.yaml", # "botsort.yaml" for Re-ID
classes: list = None,
) -> dict[int, list[dict]]:
"""
Track objects across video frames.
Returns dict mapping track_id → list of per-frame detections.
"""
cap = cv2.VideoCapture(video_path)
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = cv2.VideoWriter(output_path, fourcc, fps, (w, h))
track_history: dict[int, list[dict]] = {}
frame_num = 0
results = model.track(
video_path,
stream=True,
persist=True,
conf=conf,
tracker=tracker,
classes=classes,
verbose=False,
)
for r in results:
annotated = r.plot(line_width=2)
out.write(annotated)
if r.boxes.id is not None:
for i, track_id in enumerate(r.boxes.id.int().tolist()):
if track_id not in track_history:
track_history[track_id] = []
track_history[track_id].append({
"frame": frame_num,
"bbox": r.boxes.xyxy[i].tolist(),
"conf": float(r.boxes.conf[i]),
"class": r.names[int(r.boxes.cls[i])],
})
frame_num += 1
cap.release()
out.release()
print(f"Tracked {len(track_history)} unique objects → {output_path}")
return track_history
# ── 6. Custom training ────────────────────────────────────────────────────────
def create_dataset_yaml(
dataset_dir: str,
class_names: list[str],
yaml_path: str = "dataset.yaml",
) -> str:
"""
Create YOLO training YAML config from a dataset directory.
Expected structure: dataset_dir/images/{train,val}/ and dataset_dir/labels/{train,val}/
"""
content = f"""# YOLO Dataset Config
path: {os.path.abspath(dataset_dir)}
train: images/train
val: images/val
test: images/test # optional
nc: {len(class_names)}
names: {class_names}
"""
Path(yaml_path).write_text(content)
print(f"Dataset YAML saved: {yaml_path}")
return yaml_path
def fine_tune(
base_model: str = "yolo11n.pt",
data_yaml: str = "dataset.yaml",
epochs: int = 100,
imgsz: int = 640,
batch: int = 16,
device: str = "0", # "0" for GPU, "cpu" for CPU
patience: int = 50,
project: str = "./runs",
name: str = "custom_yolo",
augment: bool = True,
) -> YOLO:
"""
Fine-tune YOLO on a custom dataset.
Returns trained model.
"""
model = YOLO(base_model)
results = model.train(
data=data_yaml,
epochs=epochs,
imgsz=imgsz,
batch=batch,
device=device,
patience=patience,
project=project,
name=name,
# Augmentation
hsv_h=0.015 if augment else 0.0,
hsv_s=0.7 if augment else 0.0,
degrees=0.0,
translate=0.1,
scale=0.5,
fliplr=0.5,
mosaic=1.0 if augment else 0.0,
mixup=0.1 if augment else 0.0,
)
print(f"Training complete. Best mAP50: {results.results_dict.get('metrics/mAP50(B)', 0):.3f}")
return model
# ── 7. Model export ───────────────────────────────────────────────────────────
def export_model(
model: YOLO,
format: str = "onnx", # onnx | tensorrt | coreml | tflite | openvino
imgsz: int = 640,
half: bool = False,
simplify: bool = True,
) -> str:
"""Export YOLO model for deployment."""
exported_path = model.export(
format=format,
imgsz=imgsz,
half=half,
simplify=simplify,
dynamic=False,
)
print(f"Exported to {format}: {exported_path}")
return str(exported_path)
if __name__ == "__main__":
# Load model
model = load_detector("n") # Smallest, fastest
# Inference on a sample image
import urllib.request
img_url = "https://ultralytics.com/images/bus.jpg"
urllib.request.urlretrieve(img_url, "bus.jpg")
# Detect
dets = detect_objects(model, "bus.jpg", conf=0.45)
print(f"\nDetected {len(dets)} objects:")
for d in dets[:5]:
print(f" {d['class_name']}: {d['score']:.2f} @ {d['bbox']}")
# Count
counts = count_objects(model, "bus.jpg")
print(f"\nObject counts: {counts}")
# Annotate and save
results = model.predict("bus.jpg", verbose=False)
results[0].save("bus_annotated.jpg")
print("Saved: bus_annotated.jpg")
For the Detectron2 alternative when needing Faster R-CNN, Mask R-CNN, and Panoptic FPN architectures with Facebook AI research papers’ reference implementations and fine-grained control over region proposal networks — Detectron2 provides research-grade two-stage detector implementations while Ultralytics YOLO’s single-stage architecture is 10-50x faster at inference, trains in hours instead of days, and includes pose estimation, OBB, and video tracking out-of-the-box with a simpler API. For the MMDetection alternative when needing OpenMMLab’s comprehensive zoo of 200+ detection algorithms including DETR, Sparse R-CNN, and ATSS with modular config-driven architecture — MMDetection provides the widest algorithm coverage while Ultralytics is better for production deployment with ONNX, TensorRT, CoreML, and TFLite export, built-in ByteTrack video tracking, and a single model.export() call replacing custom deployment pipelines. The Claude Skills 360 bundle includes Ultralytics skill sets covering YOLOv11 detection, instance segmentation, pose estimation, video tracking, custom dataset YAML, fine-tuning, multi-format export, and result visualization. Start with the free tier to try object detection code generation.