ElevenLabs provides state-of-the-art AI text-to-speech and voice cloning — new ElevenLabsClient({ apiKey }) initializes. client.textToSpeech.convert(voiceId, { text, modelId: "eleven_multilingual_v2", voiceSettings: { stability, similarityBoost, speed } }) returns an ArrayBuffer. client.textToSpeech.stream(voiceId, { text, modelId }) returns a ReadableStream<Uint8Array> for streaming audio. client.voices.getAll() lists available voices. client.voices.add({ name, files: [audioBlob], description }) clones a voice from samples. Models: eleven_flash_v2_5 (fastest, 75ms latency), eleven_multilingual_v2 (highest quality, 29 languages), eleven_turbo_v2_5 (balanced). client.speechToSpeech.stream(voiceId, { audio: inputStream, modelId }) converts voice while preserving emotion. client.soundGeneration.convert({ text: "explosion sound effect" }) generates sound effects. client.textToVoice.createPreviews({ voiceDescription: "warm baritone narrator" }) creates custom voice previews. Claude Code generates ElevenLabs TTS APIs, streaming audio, voice cloning, and audiobook narration.
CLAUDE.md for ElevenLabs
## ElevenLabs Stack
- Version: elevenlabs >= 1.12
- Init: const client = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_API_KEY! })
- TTS buffer: const audio = await client.textToSpeech.convert(voiceId, { text, modelId: "eleven_flash_v2_5" })
- Stream: const stream = client.textToSpeech.stream(voiceId, { text, modelId: "eleven_multilingual_v2" })
- Voice list: const { voices } = await client.voices.getAll()
- Popular voices: "Rachel" (21m00tcm4tl8b9zaaQK+mEnC7mPpxY0k) — warm female; "Adam" (pNInz6obpgDQGcFmaJgB) — deep male
- Model: eleven_flash_v2_5 (lowest latency ~75ms), eleven_multilingual_v2 (29 languages, best quality)
- Voice settings: stability: 0.5 (flexible) to 1.0 (stable), similarityBoost: 0.75, style: 0 to 1, useSpeakerBoost: true
ElevenLabs Client
// lib/elevenlabs/client.ts — ElevenLabs SDK helpers
import { ElevenLabsClient } from "elevenlabs"
const eleven = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY!,
})
// Production-grade voice IDs from ElevenLabs voice library
export const VOICES = {
RACHEL: "21m00Tcm4TlGiJtqZkEq", // Warm, approachable female
ADAM: "pNInz6obpgDQGcFmaJgB", // Deep, mature male
BELLA: "EXAVITQu4vr4xnSDxMaL", // Soft female American
ELLI: "MF3mGyEYCl7XYWbV9V6O", // Young female American
JOSH: "TxGEqnHWrfWFTfGW9XjX", // Young male American
ARNOLD: "VR6AewLTigWG4xSOukaG", // Crisp male narrator
} as const
export const MODELS = {
FLASH: "eleven_flash_v2_5", // Fastest: ~75ms TTFT
TURBO: "eleven_turbo_v2_5", // Balanced speed+quality
MULTILINGUAL: "eleven_multilingual_v2", // Best quality, 29 languages
} as const
type VoiceId = string
type ModelId = (typeof MODELS)[keyof typeof MODELS]
export interface TtsOptions {
voiceId?: VoiceId
modelId?: ModelId
stability?: number // 0 (variable) - 1 (stable)
similarityBoost?: number // 0 - 1
style?: number // 0 - 1 (exaggeration)
speed?: number // 0.7 - 1.2
}
/** Convert text to audio buffer */
export async function textToAudio(text: string, options: TtsOptions = {}): Promise<Buffer> {
const {
voiceId = VOICES.RACHEL,
modelId = MODELS.FLASH,
stability = 0.5,
similarityBoost = 0.75,
style = 0,
speed = 1.0,
} = options
const audio = await eleven.textToSpeech.convert(voiceId, {
text,
modelId,
voiceSettings: {
stability,
similarityBoost,
style,
useSpeakerBoost: true,
speed,
},
outputFormat: "mp3_44100_128",
})
// Convert ReadableStream to Buffer
const chunks: Uint8Array[] = []
for await (const chunk of audio as any) {
chunks.push(chunk)
}
return Buffer.concat(chunks)
}
/** Stream audio as ReadableStream for HTTP responses */
export async function streamTextToAudio(
text: string,
options: TtsOptions = {},
): Promise<ReadableStream<Uint8Array>> {
const {
voiceId = VOICES.RACHEL,
modelId = MODELS.MULTILINGUAL,
stability = 0.5,
similarityBoost = 0.75,
} = options
const stream = await eleven.textToSpeech.stream(voiceId, {
text,
modelId,
voiceSettings: { stability, similarityBoost, useSpeakerBoost: true },
outputFormat: "mp3_44100_128",
})
return stream as unknown as ReadableStream<Uint8Array>
}
/** List all available voices */
export async function listVoices() {
const response = await eleven.voices.getAll()
return response.voices.map((v) => ({
id: v.voiceId!,
name: v.name!,
category: v.category,
description: v.description,
labels: v.labels,
previewUrl: v.previewUrl,
}))
}
/** Clone a voice from an audio file */
export async function cloneVoice(
name: string,
audioFiles: File[],
description?: string,
): Promise<string> {
const voice = await eleven.voices.add({
name,
files: audioFiles,
description: description ?? `Cloned voice: ${name}`,
labels: { source: "upload" },
})
return (voice as any).voice_id
}
/** Generate sound effect from description */
export async function generateSoundEffect(description: string): Promise<Buffer> {
const audio = await eleven.soundGeneration.convert({
text: description,
durationSeconds: 3,
promptInfluence: 0.3,
})
const chunks: Uint8Array[] = []
for await (const chunk of audio as any) {
chunks.push(chunk)
}
return Buffer.concat(chunks)
}
export { eleven }
Streaming TTS API Route
// app/api/tts/route.ts — Next.js streaming text-to-speech
import { NextResponse } from "next/server"
import { z } from "zod"
import { streamTextToAudio, VOICES, MODELS } from "@/lib/elevenlabs/client"
import { auth } from "@/lib/auth"
const TtsSchema = z.object({
text: z.string().min(1).max(5000),
voiceId: z.string().optional(),
modelId: z.enum([MODELS.FLASH, MODELS.TURBO, MODELS.MULTILINGUAL]).default(MODELS.FLASH),
stability: z.number().min(0).max(1).default(0.5),
speed: z.number().min(0.7).max(1.2).default(1.0),
})
export async function POST(req: Request) {
const session = await auth()
if (!session) return NextResponse.json({ error: "Unauthorized" }, { status: 401 })
const body = await req.json()
const input = TtsSchema.parse(body)
const audioStream = await streamTextToAudio(input.text, {
voiceId: input.voiceId ?? VOICES.RACHEL,
modelId: input.modelId,
stability: input.stability,
speed: input.speed,
})
return new Response(audioStream, {
headers: {
"Content-Type": "audio/mpeg",
"Cache-Control": "no-cache",
"Transfer-Encoding": "chunked",
},
})
}
For the OpenAI TTS alternative when a simpler setup with one API key, tts-1 and tts-1-hd models, and 6 built-in voice options (alloy, echo, fable, onyx, nova, shimmer) are sufficient — OpenAI TTS is good for straightforward narration while ElevenLabs has dramatically more realistic voices, voice cloning, emotional control, and multilingual support, see the OpenAI guide. For the Google Cloud TTS alternative when using existing Google Cloud infrastructure, needing WaveNet or Neural2 voices with SSML control, or requiring HIPAA compliance — Google Cloud TTS is enterprise-grade with strong SSML support while ElevenLabs has the most natural-sounding voices available from any provider, see the Google Cloud guide. The Claude Skills 360 bundle includes ElevenLabs skill sets covering TTS streaming, voice cloning, and sound effects. Start with the free tier to try AI voice generation.