Cloudflare Workers AI runs 50+ AI models at the edge globally — env.AI.run("@cf/meta/llama-3.1-8b-instruct", { prompt }) runs inference from a Worker binding. No cold starts — models are cached on Cloudflare’s global network. Text generation: env.AI.run("@cf/meta/llama-3.1-8b-instruct", { messages: [{ role: "user", content: prompt }], stream: true }) returns a ReadableStream for server-sent events. Embeddings: env.AI.run("@cf/baai/bge-large-en-v1.5", { text: ["sentence one", "sentence two"] }) returns { shape, data: number[][] }. Image classification: env.AI.run("@cf/microsoft/resnet-50", { image: [...uint8] }) returns label scores. Speech recognition: env.AI.run("@cf/openai/whisper", { audio: [...] }) returns { text }. Image generation: env.AI.run("@cf/stabilityai/stable-diffusion-xl-base-1.0", { prompt }) returns a PNG Uint8Array. wrangler.toml [ai] binding: binding = "AI". AI Gateway: create a gateway in the Cloudflare dashboard, use https://gateway.ai.cloudflare.com/v1/{accountId}/{gatewayId}/openai as base URL for OpenAI/Anthropic/etc — get caching, retries, rate limits, spend analytics across all providers. @cloudflare/ai-utils createWorkersAI provides an Anthropic SDK-compatible interface. Claude Code generates Cloudflare Workers AI inference, embeddings, and AI Gateway configurations.
CLAUDE.md for Cloudflare Workers AI
## Cloudflare Workers AI Stack
- Binding in wrangler.toml: [[ai]] binding = "AI"
- Worker type: export default { async fetch(req, env: Env) { const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", { messages }) } }
- Text gen: await env.AI.run("@cf/meta/llama-3.1-8b-instruct", { messages: [{role, content}] })
- Embeddings: const { data } = await env.AI.run("@cf/baai/bge-large-en-v1.5", { text: [str] }); data[0] = number[]
- Streaming: const stream = await env.AI.run(model, { ...options, stream: true }); return new Response(stream, { headers: { "Content-Type": "text/event-stream" } })
- AI Gateway: replace base URL in any OpenAI SDK with https://gateway.ai.cloudflare.com/v1/{accountId}/{gatewayId}/openai
Workers AI Worker
// src/index.ts — Cloudflare Worker with Workers AI
export interface Env {
AI: Ai // Added by wrangler [ai] binding
}
const MODELS = {
LLAMA_8B: "@cf/meta/llama-3.1-8b-instruct",
LLAMA_70B: "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
MISTRAL: "@cf/mistral/mistral-7b-instruct-v0.1",
BGE_LARGE: "@cf/baai/bge-large-en-v1.5",
BGE_SMALL: "@cf/baai/bge-small-en-v1.5",
WHISPER: "@cf/openai/whisper",
SDXL: "@cf/stabilityai/stable-diffusion-xl-base-1.0",
RESNET: "@cf/microsoft/resnet-50",
} as const
export default {
async fetch(req: Request, env: Env): Promise<Response> {
const url = new URL(req.url)
// ── Chat completion ──────────────────────────────────────────────────
if (url.pathname === "/api/chat") {
const { messages, stream = false } = await req.json<{ messages: RoleScopedChatInput[]; stream?: boolean }>()
if (stream) {
const result = await env.AI.run(MODELS.LLAMA_8B, { messages, stream: true })
return new Response(result as ReadableStream, {
headers: { "Content-Type": "text/event-stream", "Cache-Control": "no-cache" },
})
}
const result = await env.AI.run(MODELS.LLAMA_8B, { messages }) as { response: string }
return Response.json({ text: result.response })
}
// ── Embeddings ──────────────────────────────────────────────────────
if (url.pathname === "/api/embeddings" && req.method === "POST") {
const { texts } = await req.json<{ texts: string[] }>()
const result = await env.AI.run(MODELS.BGE_LARGE, { text: texts }) as { shape: number[]; data: number[][] }
return Response.json({ embeddings: result.data, dimensions: result.shape[1] })
}
// ── Speech-to-text ──────────────────────────────────────────────────
if (url.pathname === "/api/transcribe" && req.method === "POST") {
const arrayBuffer = await req.arrayBuffer()
const audio = [...new Uint8Array(arrayBuffer)]
const result = await env.AI.run(MODELS.WHISPER, { audio }) as { text: string }
return Response.json({ text: result.text })
}
// ── Image generation ────────────────────────────────────────────────
if (url.pathname === "/api/image" && req.method === "POST") {
const { prompt, steps = 20 } = await req.json<{ prompt: string; steps?: number }>()
const png = await env.AI.run(MODELS.SDXL, { prompt, num_steps: steps }) as Uint8Array
return new Response(png, { headers: { "Content-Type": "image/png" } })
}
return new Response("Not found", { status: 404 })
},
}
Wrangler Configuration
# wrangler.toml — Workers AI + Pages binding
name = "my-workers-ai-app"
compatibility_date = "2024-09-01"
[[ai]]
binding = "AI"
# For Cloudflare Pages with Workers AI:
# pages_build_output_dir = "dist/client"
# Add [[ai]] binding to pages_config as well
AI Gateway Client
// lib/cloudflare/ai-gateway.ts — route multiple AI providers through AI Gateway
// AI Gateway gives you: caching, rate limits, spend tracking, fallbacks
const ACCOUNT_ID = process.env.CLOUDFLARE_ACCOUNT_ID!
const GATEWAY_ID = process.env.CLOUDFLARE_AI_GATEWAY_ID!
const GATEWAY_URL = `https://gateway.ai.cloudflare.com/v1/${ACCOUNT_ID}/${GATEWAY_ID}`
// Drop-in for OpenAI SDK — traffic routed through AI Gateway
import OpenAI from "openai"
export const openaiViaGateway = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
baseURL: `${GATEWAY_URL}/openai`,
})
// Drop-in for Anthropic SDK — traffic routed through AI Gateway
import Anthropic from "@anthropic-ai/sdk"
export const anthropicViaGateway = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
baseURL: `${GATEWAY_URL}/anthropic`,
})
// Universal gateway fetch — any provider
export async function gatewayFetch(
provider: "openai" | "anthropic" | "groq" | "workers-ai",
path: string,
init: RequestInit,
): Promise<Response> {
const url = `${GATEWAY_URL}/${provider}${path}`
return fetch(url, {
...init,
headers: {
...((init.headers as Record<string, string>) ?? {}),
"cf-aig-cache-ttl": "300", // Cache identical requests for 5 min
"cf-aig-skip-cache": "false",
},
})
}
Next.js Edge Route with Workers AI
// app/api/ai/chat/route.ts — Next.js edge using Cloudflare Workers AI binding
// When deployed to Cloudflare Pages, env.AI is available via getRequestContext()
import { getRequestContext } from "@cloudflare/next-on-pages"
export const runtime = "edge"
export async function POST(req: Request) {
const { messages } = await req.json()
// @cloudflare/next-on-pages provides access to Worker bindings
const { env } = getRequestContext<{ AI: Ai }>()
const stream = await env.AI.run(
"@cf/meta/llama-3.1-8b-instruct",
{ messages, stream: true } as AiTextGenerationInput,
)
return new Response(stream as ReadableStream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
},
})
}
For the Vercel AI SDK alternative when deploying on Vercel infrastructure, wanting a unified useChat/useCompletion React interface that works with OpenAI/Anthropic/Google simultaneously, or needing the AI SDK’s streaming helpers — Vercel AI SDK is the TypeScript abstraction while Cloudflare Workers AI is the hardware for running open models directly at the edge without external API calls, see the Vercel AI SDK guide. For the Together AI alternative when needing a larger catalog of open models (DeepSeek-R1, Llama-3.3-70B, vision models) with enterprise SLAs and GPU inference rather than edge-local CPU inference — Together AI runs models on dedicated GPU clusters while Cloudflare Workers AI runs a curated set of models directly in the same data center as your Worker for minimal latency, see the Together AI guide. The Claude Skills 360 bundle includes Cloudflare Workers AI skill sets covering edge inference, AI Gateway, and Workers binding patterns. Start with the free tier to try edge AI generation.