1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
const axios = require('axios');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/dubbing";
const data = {
"source_url": "https://segmind-resources.s3.amazonaws.com/input/dubbing-source-audio.mp3",
"target_lang": "hi",
"source_lang": "en",
"num_speakers": 1,
"highest_resolution": true,
"drop_background_audio": false,
"use_profanity_filter": false,
"disable_voice_cloning": false,
"mode": "automatic"
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();URL of the source video/audio file to be dubbed. Supports MP3, MP4, and other common formats.
The target language to dub the content into. Use 'hi' for Hindi, 'es' for Spanish, 'fr' for French.
Allowed values:
Language of the source audio. Use 'auto' to detect automatically or specify for faster processing.
Allowed values:
Number of speakers in the audio. Set to 0 for auto-detection or specify for multi-speaker content.
min : 0,
max : 999999999999999
Start time in seconds for partial dubbing. Useful for dubbing specific segments of longer content.
min : 0,
max : 999999999999999
End time in seconds for partial dubbing. Combine with start_time to dub a specific clip segment.
min : 0,
max : 999999999999999
Enable to output the highest available video resolution. Recommended for professional or broadcast content.
Remove background audio from the dubbed output. Best for clean speeches, podcasts, or monologues.
Censors profanities in transcripts with '[censored]'. Ideal for family-friendly or corporate content.
Use ElevenLabs library voices instead of cloning the original speaker's voice. Good for generic dubbing.
URL to a CSV file with custom transcription or translation data. Use with 'manual' mode only.
URL to a custom foreground audio file. Only applicable when using CSV-based manual dubbing mode.
URL to a custom background audio file. Only applicable when using CSV-based manual dubbing mode.
Experimental accent to apply to dubbed voices. Try values like 'american', 'british', or 'australian'.
Dubbing mode selection. Use 'automatic' for standard use or 'manual' when providing a CSV transcript.
Allowed values:
Frames per second for CSV timecode parsing. Leave empty to infer FPS automatically from the file.
min : 0,
max : 999999999999999
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
ElevenLabs Dubbing is an AI-powered dubbing API that translates audio and video content into 29 languages while preserving each speaker's original voice, tone, emotion, and timing. Unlike traditional localization workflows that require re-recording with human voice actors, ElevenLabs Dubbing automates the entire pipeline — from speaker separation and transcription to translation, speech synthesis, and audio re-sync — in a single API call.
Built on ElevenLabs' Multilingual v2 model, it handles complex real-world content: overlapping dialogue, background music, ambient noise, whispers, and shouted lines. The result is natural-sounding, voice-cloned multilingual audio that maintains your original speaker's identity across languages.
start_time and end_time parameters to dub specific clips within longer filessource_lang: auto for hands-free source identificationStart with source_lang: auto unless you know the source language precisely — auto-detection is accurate and simplifies your workflow. For content with a known fixed language, specifying it directly speeds up processing.
Set num_speakers manually for dense dialogue. The default auto-detection works well for 1–3 speakers, but for panel discussions, interviews, or multi-character audio, providing an explicit count improves speaker separation quality significantly.
Use start_time and end_time for iteration. When testing output quality on long-form video, dub a representative 2–3 minute segment first before committing to full-file processing.
Keep drop_background_audio: false for most content. ElevenLabs Dubbing's ability to retain background music is a core differentiator — disabling it is best reserved for clean voiceover or podcast-only content.
Enable highest_resolution: true when dubbing video destined for broadcast, YouTube, or professional distribution.
Avoid CSV/manual mode in production. The manual mode with custom CSV transcripts is experimental and better suited for testing edge cases, not live pipelines.
How long does dubbing take for a 30-minute video? Processing is asynchronous and scales with content length. A 30-minute video can take several minutes to process. Use the status polling endpoint to check job completion — avoid setting fixed timeouts.
Which audio and video formats are supported? The API accepts MP3, MP4, and most common audio/video formats via URL. You can also pass direct URLs from YouTube, TikTok, or cloud storage buckets.
Does voice cloning work for all 29 languages?
Yes — voice cloning is applied by default across all supported languages using the Multilingual v2 model. Set disable_voice_cloning: true if you prefer generic ElevenLabs library voices instead.
What happens to background music during dubbing?
By default, background audio (music, ambient sound, SFX) is separated and re-layered into the dubbed output. Set drop_background_audio: true only if you want a clean speech-only track.
Can I target a specific accent for dubbed voices?
The target_accent parameter (e.g., "american", "british") is available but experimental. It's not recommended for production use and may produce inconsistent results across languages.
Is there a character or length limit? The API applies a character limit of approximately 3,000 characters per minute of content. Plan your content segmentation accordingly for very long or text-dense files.