You can drop your own file here
ElevenLabs Dubbing is an AI-powered dubbing API that translates audio and video content into 29 languages while preserving each speaker's original voice, tone, emotion, and timing. Unlike traditional localization workflows that require re-recording with human voice actors, ElevenLabs Dubbing automates the entire pipeline — from speaker separation and transcription to translation, speech synthesis, and audio re-sync — in a single API call.
Built on ElevenLabs' Multilingual v2 model, it handles complex real-world content: overlapping dialogue, background music, ambient noise, whispers, and shouted lines. The result is natural-sounding, voice-cloned multilingual audio that maintains your original speaker's identity across languages.
start_time and end_time parameters to dub specific clips within longer filessource_lang: auto for hands-free source identificationStart with source_lang: auto unless you know the source language precisely — auto-detection is accurate and simplifies your workflow. For content with a known fixed language, specifying it directly speeds up processing.
Set num_speakers manually for dense dialogue. The default auto-detection works well for 1–3 speakers, but for panel discussions, interviews, or multi-character audio, providing an explicit count improves speaker separation quality significantly.
Use start_time and end_time for iteration. When testing output quality on long-form video, dub a representative 2–3 minute segment first before committing to full-file processing.
Keep drop_background_audio: false for most content. ElevenLabs Dubbing's ability to retain background music is a core differentiator — disabling it is best reserved for clean voiceover or podcast-only content.
Enable highest_resolution: true when dubbing video destined for broadcast, YouTube, or professional distribution.
Avoid CSV/manual mode in production. The manual mode with custom CSV transcripts is experimental and better suited for testing edge cases, not live pipelines.
How long does dubbing take for a 30-minute video? Processing is asynchronous and scales with content length. A 30-minute video can take several minutes to process. Use the status polling endpoint to check job completion — avoid setting fixed timeouts.
Which audio and video formats are supported? The API accepts MP3, MP4, and most common audio/video formats via URL. You can also pass direct URLs from YouTube, TikTok, or cloud storage buckets.
Does voice cloning work for all 29 languages?
Yes — voice cloning is applied by default across all supported languages using the Multilingual v2 model. Set disable_voice_cloning: true if you prefer generic ElevenLabs library voices instead.
What happens to background music during dubbing?
By default, background audio (music, ambient sound, SFX) is separated and re-layered into the dubbed output. Set drop_background_audio: true only if you want a clean speech-only track.
Can I target a specific accent for dubbed voices?
The target_accent parameter (e.g., "american", "british") is available but experimental. It's not recommended for production use and may produce inconsistent results across languages.
Is there a character or length limit? The API applies a character limit of approximately 3,000 characters per minute of content. Plan your content segmentation accordingly for very long or text-dense files.