Deepgram provides fast, accurate speech-to-text with real-time streaming — new Deepgram(apiKey) initializes the client. deepgram.listen.prerecorded.transcribeUrl(urlSource, { model: "nova-3", smart_format: true }) transcribes audio from a URL. deepgram.listen.prerecorded.transcribeFile(buffer, { model: "nova-3", diarize: true }) transcribes from a Buffer. The response has result.results.channels[0].alternatives[0].transcript. Live transcription: deepgram.listen.live({ model: "nova-3", punctuate: true, interim_results: true }) creates a LiveTranscriptionEvents.Transcript event emitter. connection.send(audioBuffer) streams audio chunks. connection.requestClose() ends the session. Diarization: diarize: true adds speaker labels. utterances: true provides sentence-level timing. paragraphs: true structures output. language: "es" or detect_language: true for multilingual. callback: "https://..." for async prerecorded jobs. summarize: true, topics: true, sentiment: true for intelligence features. Claude Code generates Deepgram transcription APIs, real-time subtitles, and meeting transcription systems.
CLAUDE.md for Deepgram
## Deepgram Stack
- Version: @deepgram/sdk >= 3.9
- Init: const deepgram = createClient(process.env.DEEPGRAM_API_KEY!)
- Prerecorded URL: const { result } = await deepgram.listen.prerecorded.transcribeUrl({ url }, { model: "nova-3", smart_format: true, punctuate: true, diarize: true })
- Transcript: result.results.channels[0].alternatives[0].transcript
- File/Buffer: await deepgram.listen.prerecorded.transcribeFile(audioBuffer, { model: "nova-3" })
- Live: const conn = deepgram.listen.live({ model: "nova-3", smart_format: true }); conn.on(LiveTranscriptionEvents.Transcript, handler); conn.send(audioChunk)
- Words with timestamps: result.results.channels[0].alternatives[0].words — [{ word, start, end, confidence, speaker? }]
Deepgram Client
// lib/deepgram/client.ts — Deepgram SDK helpers
import { createClient, LiveTranscriptionEvents } from "@deepgram/sdk"
import type { DeepgramClient, LiveSchema, PrerecordedSchema } from "@deepgram/sdk"
const deepgram = createClient(process.env.DEEPGRAM_API_KEY!)
export const MODELS = {
NOVA3: "nova-3", // Best accuracy (general)
NOVA2: "nova-2", // Great quality, lower cost
ENHANCED: "enhanced", // Phone/voicemail optimized
BASE: "base", // Fastest, lowest cost
WHISPER: "whisper-large", // Whisper-compatible
} as const
type DeepgramModel = (typeof MODELS)[keyof typeof MODELS]
export type TranscriptWord = {
word: string
start: number
end: number
confidence: number
speaker?: number
punctuated_word?: string
}
export type Utterance = {
start: number
end: number
transcript: string
speaker?: number
confidence: number
}
export type TranscriptionResult = {
transcript: string
confidence: number
words: TranscriptWord[]
utterances?: Utterance[]
paragraphs?: string[]
language?: string
duration?: number
summary?: string
}
/** Transcribe audio from a URL */
export async function transcribeUrl(
url: string,
options: Partial<PrerecordedSchema> = {},
): Promise<TranscriptionResult> {
const { result, error } = await deepgram.listen.prerecorded.transcribeUrl(
{ url },
{
model: MODELS.NOVA3,
smart_format: true,
punctuate: true,
utterances: true,
paragraphs: true,
language: "en",
...options,
},
)
if (error) throw new Error(`Deepgram error: ${error.message}`)
return parseResult(result)
}
/** Transcribe audio from a Buffer or File */
export async function transcribeFile(
audio: Buffer | Blob,
options: Partial<PrerecordedSchema> = {},
): Promise<TranscriptionResult> {
const buffer = audio instanceof Blob ? Buffer.from(await audio.arrayBuffer()) : audio
const { result, error } = await deepgram.listen.prerecorded.transcribeFile(buffer, {
model: MODELS.NOVA3,
smart_format: true,
punctuate: true,
diarize: true,
utterances: true,
language: "en",
...options,
})
if (error) throw new Error(`Deepgram error: ${error.message}`)
return parseResult(result)
}
/** Create a live transcription connection */
export function createLiveSession(
onTranscript: (text: string, isFinal: boolean, words?: TranscriptWord[]) => void,
options: Partial<LiveSchema> = {},
) {
const connection = deepgram.listen.live({
model: MODELS.NOVA3,
smart_format: true,
interim_results: true,
language: "en-US",
endpointing: 300, // ms of silence to detect end of utterance
...options,
})
connection.on(LiveTranscriptionEvents.Open, () => {
console.log("[Deepgram] Live session opened")
})
connection.on(LiveTranscriptionEvents.Transcript, (data) => {
const alt = data.channel?.alternatives?.[0]
if (!alt) return
const isFinal = data.is_final ?? false
const words = alt.words as TranscriptWord[]
onTranscript(alt.transcript ?? "", isFinal, words)
})
connection.on(LiveTranscriptionEvents.Error, (err) => {
console.error("[Deepgram] Live error:", err)
})
connection.on(LiveTranscriptionEvents.Close, () => {
console.log("[Deepgram] Live session closed")
})
return {
send: (audio: ArrayBuffer | Buffer) => connection.send(audio),
close: () => connection.requestClose(),
}
}
function parseResult(result: any): TranscriptionResult {
const channel = result?.results?.channels?.[0]
const alt = channel?.alternatives?.[0]
if (!alt) return { transcript: "", confidence: 0, words: [] }
const paragraphs = alt.paragraphs?.paragraphs?.map(
(p: any) => p.sentences?.map((s: any) => s.text).join(" ") ?? "",
)
return {
transcript: alt.transcript ?? "",
confidence: alt.confidence ?? 0,
words: (alt.words ?? []) as TranscriptWord[],
utterances: result?.results?.utterances as Utterance[] | undefined,
paragraphs,
language: result?.results?.channels?.[0]?.detected_language,
duration: result?.metadata?.duration,
summary: (result?.results as any)?.summary?.short,
}
}
export { deepgram }
Meeting Transcription with Speaker Labels
// lib/deepgram/meeting.ts — diarized meeting transcription
import { transcribeFile, type TranscriptionResult } from "./client"
export type Speaker = { id: number; name?: string }
export type MeetingTurn = { speaker: Speaker; text: string; start: number; end: number }
export type MeetingTranscript = { turns: MeetingTurn[]; duration: number; summary?: string }
export async function transcribeMeeting(audioFile: Buffer): Promise<MeetingTranscript> {
const result = await transcribeFile(audioFile, {
diarize: true,
utterances: true,
summarize: "v2" as any,
smart_format: true,
})
const turns: MeetingTurn[] = []
if (result.utterances?.length) {
let currentSpeaker = -1
let currentText = ""
let currentStart = 0
let currentEnd = 0
for (const utterance of result.utterances) {
const speakerId = utterance.speaker ?? 0
if (speakerId !== currentSpeaker && currentText) {
turns.push({
speaker: { id: currentSpeaker },
text: currentText.trim(),
start: currentStart,
end: currentEnd,
})
currentText = ""
}
currentSpeaker = speakerId
currentText += ` ${utterance.transcript}`
if (!currentText.trim()) currentStart = utterance.start
currentEnd = utterance.end
}
if (currentText.trim()) {
turns.push({ speaker: { id: currentSpeaker }, text: currentText.trim(), start: currentStart, end: currentEnd })
}
}
return {
turns,
duration: result.duration ?? 0,
summary: result.summary,
}
}
export function formatTranscriptText(transcript: MeetingTranscript): string {
return transcript.turns
.map((turn) => {
const minutes = Math.floor(turn.start / 60).toString().padStart(2, "0")
const seconds = Math.floor(turn.start % 60).toString().padStart(2, "0")
const name = turn.speaker.name ?? `Speaker ${turn.speaker.id + 1}`
return `[${minutes}:${seconds}] ${name}: ${turn.text}`
})
.join("\n\n")
}
For the OpenAI Whisper alternative when using the same OpenAI API key, needing local inference (open-weight model), or processing non-English audio where Whisper has strong multilingual support built in — OpenAI Whisper API is solid for batch transcription while Deepgram offers sub-300ms real-time streaming latency, diarization, and intelligence features that Whisper doesn’t have, see the OpenAI guide. For the AssemblyAI alternative when needing LeMUR (large language model over audio) for question answering about recordings, AI highlights, chapter summaries, GDPR-compliant EU data processing, or PII redaction — AssemblyAI has a richer AI intelligence layer while Deepgram has faster streaming and lower transcription cost per minute, see the AssemblyAI guide. The Claude Skills 360 bundle includes Deepgram skill sets covering real-time streaming, meeting transcription, and diarization. Start with the free tier to try live transcription generation.