POST
javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 const axios = require('axios'); const api_key = "YOUR API-KEY"; const url = "https://api.segmind.com/v1/dubbing"; const data = { "source_url": "https://segmind-resources.s3.amazonaws.com/input/dubbing-source-audio.mp3", "target_lang": "hi", "source_lang": "en", "num_speakers": 1, "highest_resolution": true, "drop_background_audio": false, "use_profanity_filter": false, "disable_voice_cloning": false, "mode": "automatic" }; (async function() { try { const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } }); console.log(response.data); } catch (error) { console.error('Error:', error.response.data); } })();
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


source_urlstr *

URL of the source video/audio file to be dubbed. Supports MP3, MP4, and other common formats.


target_langenum:str *

The target language to dub the content into. Use 'hi' for Hindi, 'es' for Spanish, 'fr' for French.

Allowed values:


source_langenum:str ( default: en )

Language of the source audio. Use 'auto' to detect automatically or specify for faster processing.

Allowed values:


num_speakersint ( default: 1 )

Number of speakers in the audio. Set to 0 for auto-detection or specify for multi-speaker content.

min : 0,

max : 999999999999999


start_timeint ( default: 1 )

Start time in seconds for partial dubbing. Useful for dubbing specific segments of longer content.

min : 0,

max : 999999999999999


end_timeint ( default: 1 )

End time in seconds for partial dubbing. Combine with start_time to dub a specific clip segment.

min : 0,

max : 999999999999999


highest_resolutionboolean ( default: true )

Enable to output the highest available video resolution. Recommended for professional or broadcast content.


drop_background_audioboolean ( default: 1 )

Remove background audio from the dubbed output. Best for clean speeches, podcasts, or monologues.


use_profanity_filterboolean ( default: 1 )

Censors profanities in transcripts with '[censored]'. Ideal for family-friendly or corporate content.


disable_voice_cloningboolean ( default: 1 )

Use ElevenLabs library voices instead of cloning the original speaker's voice. Good for generic dubbing.


csv_file_urlstr ( default: 1 )

URL to a CSV file with custom transcription or translation data. Use with 'manual' mode only.


foreground_audio_urlstr ( default: 1 )

URL to a custom foreground audio file. Only applicable when using CSV-based manual dubbing mode.


background_audio_urlstr ( default: 1 )

URL to a custom background audio file. Only applicable when using CSV-based manual dubbing mode.


target_accentstr ( default: 1 )

Experimental accent to apply to dubbed voices. Try values like 'american', 'british', or 'australian'.


modeenum:str ( default: automatic )

Dubbing mode selection. Use 'automatic' for standard use or 'manual' when providing a CSV transcript.

Allowed values:


csv_fpsfloat ( default: 1 )

Frames per second for CSV timecode parsing. Leave empty to infer FPS automatically from the file.

min : 0,

max : 999999999999999

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

ElevenLabs Dubbing: AI-Powered Audio & Video Translation API

What is ElevenLabs Dubbing?

ElevenLabs Dubbing is an AI-powered dubbing API that translates audio and video content into 29 languages while preserving each speaker's original voice, tone, emotion, and timing. Unlike traditional localization workflows that require re-recording with human voice actors, ElevenLabs Dubbing automates the entire pipeline — from speaker separation and transcription to translation, speech synthesis, and audio re-sync — in a single API call.

Built on ElevenLabs' Multilingual v2 model, it handles complex real-world content: overlapping dialogue, background music, ambient noise, whispers, and shouted lines. The result is natural-sounding, voice-cloned multilingual audio that maintains your original speaker's identity across languages.


Key Features

  • 29-language support — English, Hindi, Spanish, Japanese, Arabic, French, German, Korean, Tamil, and more
  • Automatic speaker separation — detects and isolates multiple overlapping voices with zero manual configuration
  • Voice cloning with emotion retention — preserves accent, tone, and emotional nuance using Multilingual v2
  • Background audio preservation — music, SFX, and ambient noise survive the dubbing process intact
  • Segment-level dubbing — use start_time and end_time parameters to dub specific clips within longer files
  • Auto language detection — set source_lang: auto for hands-free source identification
  • Advanced controls — profanity filtering, highest-resolution output, voice cloning toggle, and CSV-based manual mode
  • Python and TypeScript SDKs — production-ready async API with straightforward status polling

Best Use Cases

  • Content creators and YouTubers localizing videos for Hindi, Spanish, or Arabic-speaking audiences
  • Podcast producers generating multilingual versions of long-form audio without re-recording
  • Media studios dubbing trailers, courses, or documentary content at scale
  • EdTech platforms delivering educational video in regional languages without hiring voice actors
  • App developers building programmatic translation pipelines for UGC platforms or streaming products
  • Corporate teams localizing training videos and product demos for global rollouts

Prompt Tips and Output Quality

Start with source_lang: auto unless you know the source language precisely — auto-detection is accurate and simplifies your workflow. For content with a known fixed language, specifying it directly speeds up processing.

Set num_speakers manually for dense dialogue. The default auto-detection works well for 1–3 speakers, but for panel discussions, interviews, or multi-character audio, providing an explicit count improves speaker separation quality significantly.

Use start_time and end_time for iteration. When testing output quality on long-form video, dub a representative 2–3 minute segment first before committing to full-file processing.

Keep drop_background_audio: false for most content. ElevenLabs Dubbing's ability to retain background music is a core differentiator — disabling it is best reserved for clean voiceover or podcast-only content.

Enable highest_resolution: true when dubbing video destined for broadcast, YouTube, or professional distribution.

Avoid CSV/manual mode in production. The manual mode with custom CSV transcripts is experimental and better suited for testing edge cases, not live pipelines.


FAQs

How long does dubbing take for a 30-minute video? Processing is asynchronous and scales with content length. A 30-minute video can take several minutes to process. Use the status polling endpoint to check job completion — avoid setting fixed timeouts.

Which audio and video formats are supported? The API accepts MP3, MP4, and most common audio/video formats via URL. You can also pass direct URLs from YouTube, TikTok, or cloud storage buckets.

Does voice cloning work for all 29 languages? Yes — voice cloning is applied by default across all supported languages using the Multilingual v2 model. Set disable_voice_cloning: true if you prefer generic ElevenLabs library voices instead.

What happens to background music during dubbing? By default, background audio (music, ambient sound, SFX) is separated and re-layered into the dubbed output. Set drop_background_audio: true only if you want a clean speech-only track.

Can I target a specific accent for dubbed voices? The target_accent parameter (e.g., "american", "british") is available but experimental. It's not recommended for production use and may produce inconsistent results across languages.

Is there a character or length limit? The API applies a character limit of approximately 3,000 characters per minute of content. Plan your content segmentation accordingly for very long or text-dense files.