Replicate runs thousands of open-source AI models via API — import Replicate from "replicate" and new Replicate({ auth: apiKey }) initializes the client. await replicate.run("stability-ai/sdxl:...", { input: { prompt, negative_prompt, width, height } }) generates images. await replicate.run("meta/llama-2-70b-chat:...", { input: { prompt, system_prompt } }) runs text generation. The output is usually a URL string for images or an array of strings for text. Streaming: replicate.stream("model/version", { input }) yields tokens. replicate.predictions.create({ version, input }) creates a prediction. replicate.predictions.wait(prediction) polls until complete. Webhooks: webhook: "https://myapp.com/api/webhooks/replicate" for async callbacks. replicate.deployments.predictions.create(...) for faster inference on dedicated deployments. Training: replicate.trainings.create(...) fine-tunes models. replicate.models.versions.list(owner, name) lists a model’s versions. Claude Code generates Replicate image generation APIs, streaming text, video, and fine-tuning pipelines.
CLAUDE.md for Replicate
## Replicate Stack
- Version: replicate >= 1.0
- Init: const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN! })
- Run (image): const output = await replicate.run("stability-ai/sdxl:VERSION_HASH", { input: { prompt, width: 1024, height: 1024, num_outputs: 1 } }) as string[]
- Run (text): const output = await replicate.run("meta/meta-llama-3-8b-instruct:VERSION_HASH", { input: { prompt } }) as string[]
- Stream: for await (const event of replicate.stream("model/version", { input: {prompt} })) text += event.toString()
- Async: const pred = await replicate.predictions.create({ version, input, webhook: WEBHOOK_URL }); await replicate.predictions.wait(pred)
- Cancel: await replicate.predictions.cancel(predictionId)
Replicate Client
// lib/replicate/client.ts — Replicate SDK with model helpers
import Replicate from "replicate"
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN!,
})
// Popular model versions — pin these for production stability
export const MODELS = {
SDXL: "stability-ai/sdxl:7762fd07cf82c948538e41f63f77d685e02b063e37291fae01d7745f20c5b"
+ "d9d" as const,
LLAMA_8B: "meta/meta-llama-3-8b-instruct:f4e1de748818b42f2c8bd"
+ "1d90" as const,
LLAMA_70B: "meta/meta-llama-3-70b-instruct:b6c353d8c"
+ "7c2e" as const,
CONTROLNET_CANNY: "jagilley/controlnet-canny:aff48af9c"
+ "14d2" as const,
BLIP_CAPTION: "salesforce/blip:2e1dddc8a"
+ "7f1" as const,
} as const
// ── Image generation ───────────────────────────────────────────────────────
export type ImageInput = {
prompt: string
negativePrompt?: string
width?: number
height?: number
numOutputs?: number
guidanceScale?: number
numInferenceSteps?: number
seed?: number
}
export async function generateImages(input: ImageInput): Promise<string[]> {
const output = await replicate.run(MODELS.SDXL, {
input: {
prompt: input.prompt,
negative_prompt: input.negativePrompt ?? "blurry, low quality, distorted",
width: input.width ?? 1024,
height: input.height ?? 1024,
num_outputs: input.numOutputs ?? 1,
guidance_scale: input.guidanceScale ?? 7.5,
num_inference_steps: input.numInferenceSteps ?? 30,
...(input.seed !== undefined ? { seed: input.seed } : {}),
},
})
return (output as string[]).filter(Boolean)
}
// ── Text generation ───────────────────────────────────────────────────────
export async function generateText(
prompt: string,
systemPrompt?: string,
maxTokens = 512,
): Promise<string> {
const output = await replicate.run(MODELS.LLAMA_8B, {
input: {
prompt,
...(systemPrompt ? { system_prompt: systemPrompt } : {}),
max_tokens: maxTokens,
temperature: 0.7,
},
})
return (output as string[]).join("")
}
export async function* streamText(
prompt: string,
systemPrompt?: string,
): AsyncGenerator<string> {
const events = replicate.stream(MODELS.LLAMA_8B, {
input: {
prompt,
...(systemPrompt ? { system_prompt: systemPrompt } : {}),
temperature: 0.7,
},
})
for await (const event of events) {
yield event.toString()
}
}
// ── Image captioning ───────────────────────────────────────────────────────
export async function captionImage(imageUrl: string): Promise<string> {
const output = await replicate.run(MODELS.BLIP_CAPTION, {
input: { image: imageUrl, task: "image_captioning" },
})
return (output as string).trim()
}
// ── Async prediction with webhook ─────────────────────────────────────────
export async function createImagePrediction(
input: ImageInput,
webhookUrl: string,
): Promise<{ id: string; status: string }> {
const prediction = await replicate.predictions.create({
version: MODELS.SDXL.split(":")[1]!,
input: {
prompt: input.prompt,
negative_prompt: input.negativePrompt,
width: input.width ?? 1024,
height: input.height ?? 1024,
num_outputs: 1,
},
webhook: webhookUrl,
webhookEventsFilter: ["completed"],
})
return { id: prediction.id, status: prediction.status }
}
export async function getPrediction(id: string) {
return replicate.predictions.get(id)
}
export { replicate }
Image Generation API Routes
// app/api/generate/image/route.ts — synchronous image generation
import { NextResponse } from "next/server"
import { z } from "zod"
import { generateImages } from "@/lib/replicate/client"
import { auth } from "@/lib/auth"
import { rateLimit } from "@/lib/rate-limit"
const ImageSchema = z.object({
prompt: z.string().min(3).max(1000),
negativePrompt: z.string().max(500).optional(),
width: z.number().int().min(512).max(1024).default(1024),
height: z.number().int().min(512).max(1024).default(1024),
})
export async function POST(req: Request) {
const session = await auth()
if (!session) return NextResponse.json({ error: "Unauthorized" }, { status: 401 })
const { success } = await rateLimit(session.user.id, { requests: 10, window: 60 })
if (!success) return NextResponse.json({ error: "Rate limit exceeded" }, { status: 429 })
const body = await req.json()
const input = ImageSchema.parse(body)
const urls = await generateImages(input)
return NextResponse.json({ urls })
}
// app/api/webhooks/replicate/route.ts — webhook for async predictions
import { NextResponse } from "next/server"
import { getPrediction } from "@/lib/replicate/client"
export async function POST(req: Request) {
const body = await req.json()
const { id, status, output } = body
if (status === "succeeded" && output) {
// Store result, notify user via websocket/pusher, etc.
console.log(`[Replicate] Prediction ${id} completed:`, output)
// await db.predictions.update({ id }, { status: "done", urls: output })
}
if (status === "failed") {
console.error(`[Replicate] Prediction ${id} failed:`, body.error)
}
return NextResponse.json({ ok: true })
}
For the OpenAI DALL-E alternative when generating images within a single OpenAI ecosystem, needing outpainting/inpainting via the edits API, or requiring image variations — DALL-E is integrated with ChatGPT and produces reliable results for general imagery while Replicate hosts Stable Diffusion XL, ControlNet, and hundreds of specialized models for specific styles, see the OpenAI guide. For the Stability AI direct API alternative when accessing the latest Stability AI models without a third-party wrapper, reducing the cost per image, or using the Stability AI REST API directly with fine-grained model controls — Stability AI’s API provides direct access while Replicate runs 10,000+ models from various providers on unified infrastructure with simple per-second billing, see the Stability AI guide. The Claude Skills 360 bundle includes Replicate skill sets covering image generation, streaming text, webhooks, and fine-tuning. Start with the free tier to try AI model API generation.