AssemblyAI transcribes audio and applies AI intelligence layers — new AssemblyAI({ apiKey }) initializes the client. client.transcripts.transcribe({ audio_url, speaker_labels: true, sentiment_analysis: true }) submits and polls until complete. transcript.text is the full transcription. transcript.utterances has [{ speaker, text, start, end }] for diarized turns. transcript.sentiment_analysis_results gives [{ text, sentiment, confidence }] per sentence. transcript.entities detects named entities. transcript.auto_highlights_result.results extracts key phrases. transcript.chapters provides AI-generated chapter summaries. LeMUR (Large Language Model over Audio): client.lemur.task({ transcript_ids: [id], prompt: "Summarize the key decisions" }) runs an LLM over the transcript content — result.response is the answer. client.lemur.questionAnswer(...) for structured Q&A. Streaming: client.realtime.transcriber({ sampleRate: 16000 }) creates a WebSocket-based transcriber that emits transcript.partial and transcript.final events. client.files.upload(buffer) returns an upload_url for local files. Claude Code generates AssemblyAI transcription APIs, meeting intelligence, and LeMUR-powered audio Q&A.
CLAUDE.md for AssemblyAI
## AssemblyAI Stack
- Version: assemblyai >= 4.7
- Init: const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY! })
- Transcribe URL: const transcript = await client.transcripts.transcribe({ audio_url: url, speaker_labels: true, sentiment_analysis: true })
- Upload file: const { upload_url } = await client.files.upload(audioBuffer); then use upload_url as audio_url
- Transcript fields: .text (string), .utterances ([{ speaker, text, start, end }]), .entities, .chapters, .sentiment_analysis_results
- LeMUR: const { response } = await client.lemur.task({ transcript_ids: [transcript.id], prompt: "..." })
- Realtime: const rt = client.realtime.transcriber({ sampleRate: 16000 }); rt.on("transcript.final", handler); await rt.connect()
AssemblyAI Client
// lib/assemblyai/client.ts — AssemblyAI SDK with full feature set
import AssemblyAI, {
type TranscribeParams,
type RealtimeTranscriberParams,
LemurModel,
} from "assemblyai"
const client = new AssemblyAI({
apiKey: process.env.ASSEMBLYAI_API_KEY!,
})
export type Utterance = {
speaker: string
text: string
start: number
end: number
confidence: number
}
export type TranscriptionResult = {
id: string
text: string
utterances: Utterance[]
entities: Array<{ entity_type: string; text: string; start: number; end: number }>
sentimentResults: Array<{ text: string; sentiment: "POSITIVE" | "NEGATIVE" | "NEUTRAL"; confidence: number }>
chapters: Array<{ headline: string; summary: string; start: number; end: number }>
highlights: Array<{ text: string; count: number; rank: number }>
duration: number
language?: string
}
export type TranscribeOptions = {
speakerLabels?: boolean
sentimentAnalysis?: boolean
entityDetection?: boolean
autoChapters?: boolean
autoHighlights?: boolean
contentSafety?: boolean
piiPolicies?: string[]
customSpelling?: Array<{ from: string[]; to: string }>
languageCode?: string
webhookUrl?: string
}
/** Transcribe from a public audio URL */
export async function transcribeUrl(
url: string,
options: TranscribeOptions = {},
): Promise<TranscriptionResult> {
const params: TranscribeParams = {
audio_url: url,
speaker_labels: options.speakerLabels ?? true,
sentiment_analysis: options.sentimentAnalysis ?? false,
entity_detection: options.entityDetection ?? false,
auto_chapters: options.autoChapters ?? false,
auto_highlights: options.autoHighlights ?? false,
content_safety: options.contentSafety ?? false,
...(options.piiPolicies ? {
redact_pii: true,
redact_pii_policies: options.piiPolicies as any,
redact_pii_sub: "hash",
} : {}),
...(options.customSpelling ? { custom_spelling: options.customSpelling } : {}),
...(options.languageCode ? { language_code: options.languageCode as any } : {}),
...(options.webhookUrl ? { webhook_url: options.webhookUrl } : {}),
}
const transcript = await client.transcripts.transcribe(params)
if (transcript.status === "error") {
throw new Error(`AssemblyAI transcription failed: ${transcript.error}`)
}
return mapTranscript(transcript)
}
/** Upload a local buffer and transcribe it */
export async function transcribeBuffer(
buffer: Buffer,
options: TranscribeOptions = {},
): Promise<TranscriptionResult> {
const { upload_url } = await client.files.upload(buffer)
return transcribeUrl(upload_url, options)
}
/** Get a previously submitted transcript by ID */
export async function getTranscript(id: string): Promise<TranscriptionResult> {
const transcript = await client.transcripts.get(id)
return mapTranscript(transcript)
}
function mapTranscript(t: any): TranscriptionResult {
return {
id: t.id,
text: t.text ?? "",
utterances: (t.utterances ?? []).map((u: any) => ({
speaker: u.speaker,
text: u.text,
start: u.start,
end: u.end,
confidence: u.confidence,
})) as Utterance[],
entities: t.entities ?? [],
sentimentResults: t.sentiment_analysis_results ?? [],
chapters: t.chapters ?? [],
highlights: t.auto_highlights_result?.results ?? [],
duration: t.audio_duration ?? 0,
language: t.language_code,
}
}
export { client }
LeMUR Audio Q&A
// lib/assemblyai/lemur.ts — LLM over transcribed audio
import { client } from "./client"
import { LemurModel } from "assemblyai"
export type LemurOptions = {
model?: "claude3_5_sonnet" | "claude3_sonnet" | "claude3_haiku" | "default"
temperature?: number
maxOutputSize?: number
context?: string
}
/** Ask a free-form question about one or more transcripts */
export async function lemurTask(
transcriptIds: string[],
prompt: string,
options: LemurOptions = {},
): Promise<string> {
const result = await client.lemur.task({
transcript_ids: transcriptIds,
prompt,
final_model: (options.model ?? "default") as any,
temperature: options.temperature,
max_output_size: options.maxOutputSize ?? 2048,
...(options.context ? { context: options.context } : {}),
})
return result.response
}
/** Structured Q&A — returns typed answers for each question */
export async function lemurQA(
transcriptIds: string[],
questions: Array<{ question: string; context?: string; answerFormat?: string }>,
options: LemurOptions = {},
): Promise<Array<{ question: string; answer: string }>> {
const result = await client.lemur.questionAnswer({
transcript_ids: transcriptIds,
questions: questions.map((q) => ({
question: q.question,
...(q.context ? { context: q.context } : {}),
...(q.answerFormat ? { answer_format: q.answerFormat } : {}),
})),
final_model: (options.model ?? "default") as any,
})
return result.response
}
/** Extract action items from a meeting transcript */
export async function extractActionItems(transcriptId: string): Promise<string[]> {
const response = await lemurTask(
[transcriptId],
"List all action items and decisions from this meeting. Format as a bullet list. Be specific about who is responsible for each item if mentioned.",
{ model: "claude3_5_sonnet" },
)
// Parse bullet list into array
return response
.split("\n")
.filter((line) => line.match(/^[-•*]\s+/))
.map((line) => line.replace(/^[-•*]\s+/, "").trim())
.filter(Boolean)
}
/** Generate meeting summary with key decisions */
export async function summarizeMeeting(
transcriptId: string,
): Promise<{ summary: string; keyDecisions: string[]; nextSteps: string[] }> {
const [summary, actionItems] = await Promise.all([
lemurTask(
[transcriptId],
"Provide a 3-5 sentence executive summary of this meeting. Focus on the main topics discussed and outcomes.",
),
extractActionItems(transcriptId),
])
const decisionsResponse = await lemurTask(
[transcriptId],
"List only the key decisions made in this meeting. Format as a bullet list.",
)
const keyDecisions = decisionsResponse
.split("\n")
.filter((l) => l.match(/^[-•*]\s+/))
.map((l) => l.replace(/^[-•*]\s+/, "").trim())
return { summary, keyDecisions, nextSteps: actionItems }
}
Real-Time Transcription WebSocket
// lib/assemblyai/realtime.ts — streaming transcription session
import { client } from "./client"
import type { RealtimeTranscript } from "assemblyai"
export type RealtimeCallbacks = {
onPartial: (text: string) => void
onFinal: (text: string, audioStart: number, audioEnd: number) => void
onError: (error: Error) => void
onClose?: () => void
}
export function createRealtimeSession(
sampleRate: 16000 | 8000 = 16000,
callbacks: RealtimeCallbacks,
) {
const transcriber = client.realtime.transcriber({
sampleRate,
encoding: "pcm_s16le",
disablePartialTranscripts: false,
wordBoost: [],
})
transcriber.on("transcript.partial", (t: RealtimeTranscript) => {
if (t.text) callbacks.onPartial(t.text)
})
transcriber.on("transcript.final", (t: RealtimeTranscript) => {
if (t.text) callbacks.onFinal(t.text, t.audio_start, t.audio_end)
})
transcriber.on("error", (err: Error) => callbacks.onError(err))
transcriber.on("close", (code: number, reason: string) => {
console.log(`[AssemblyAI] Realtime closed: ${code} ${reason}`)
callbacks.onClose?.()
})
return {
connect: () => transcriber.connect(),
send: (audio: ArrayBuffer) => transcriber.sendAudio(audio),
close: () => transcriber.close(),
}
}
Next.js Transcription API Route
// app/api/transcribe/route.ts — upload and transcribe audio file
import { NextResponse } from "next/server"
import { transcribeBuffer } from "@/lib/assemblyai/client"
import { auth } from "@/lib/auth"
export async function POST(req: Request) {
const session = await auth()
if (!session) return NextResponse.json({ error: "Unauthorized" }, { status: 401 })
const formData = await req.formData()
const file = formData.get("audio") as File | null
if (!file) return NextResponse.json({ error: "No audio file" }, { status: 400 })
const buffer = Buffer.from(await file.arrayBuffer())
const result = await transcribeBuffer(buffer, {
speakerLabels: true,
autoChapters: true,
autoHighlights: true,
sentimentAnalysis: true,
})
return NextResponse.json({
id: result.id,
text: result.text,
duration: result.duration,
speakers: [...new Set(result.utterances.map((u) => u.speaker))].length,
utterances: result.utterances,
chapters: result.chapters,
})
}
For the Deepgram alternative when needing sub-300ms real-time streaming latency, built-in diarization with Nova-3, or cost-per-minute is the primary concern — Deepgram has faster streaming and lower pricing while AssemblyAI has the richer AI intelligence layer (LeMUR, automated chapters, deeper sentiment), see the Deepgram guide. For the OpenAI Whisper alternative when using a single OpenAI API key and batch transcription without real-time streaming, or when multilingual audio with strong language detection is the primary requirement — Whisper is best for offline batch while AssemblyAI’s LeMUR is unmatched for asking LLM questions over recorded audio content, see the OpenAI guide. The Claude Skills 360 bundle includes AssemblyAI skill sets covering transcription, LeMUR Q&A, and real-time streaming. Start with the free tier to try audio intelligence generation.