$

Cost per second

For enterprise pricing and custom weights or models

ElevenLabs Dubbing: AI-Powered Audio & Video Translation API

What is ElevenLabs Dubbing?

ElevenLabs Dubbing is an AI-powered dubbing API that translates audio and video content into 29 languages while preserving each speaker's original voice, tone, emotion, and timing. Unlike traditional localization workflows that require re-recording with human voice actors, ElevenLabs Dubbing automates the entire pipeline — from speaker separation and transcription to translation, speech synthesis, and audio re-sync — in a single API call.

Built on ElevenLabs' Multilingual v2 model, it handles complex real-world content: overlapping dialogue, background music, ambient noise, whispers, and shouted lines. The result is natural-sounding, voice-cloned multilingual audio that maintains your original speaker's identity across languages.


Key Features

  • 29-language support — English, Hindi, Spanish, Japanese, Arabic, French, German, Korean, Tamil, and more
  • Automatic speaker separation — detects and isolates multiple overlapping voices with zero manual configuration
  • Voice cloning with emotion retention — preserves accent, tone, and emotional nuance using Multilingual v2
  • Background audio preservation — music, SFX, and ambient noise survive the dubbing process intact
  • Segment-level dubbing — use start_time and end_time parameters to dub specific clips within longer files
  • Auto language detection — set source_lang: auto for hands-free source identification
  • Advanced controls — profanity filtering, highest-resolution output, voice cloning toggle, and CSV-based manual mode
  • Python and TypeScript SDKs — production-ready async API with straightforward status polling

Best Use Cases

  • Content creators and YouTubers localizing videos for Hindi, Spanish, or Arabic-speaking audiences
  • Podcast producers generating multilingual versions of long-form audio without re-recording
  • Media studios dubbing trailers, courses, or documentary content at scale
  • EdTech platforms delivering educational video in regional languages without hiring voice actors
  • App developers building programmatic translation pipelines for UGC platforms or streaming products
  • Corporate teams localizing training videos and product demos for global rollouts

Prompt Tips and Output Quality

Start with source_lang: auto unless you know the source language precisely — auto-detection is accurate and simplifies your workflow. For content with a known fixed language, specifying it directly speeds up processing.

Set num_speakers manually for dense dialogue. The default auto-detection works well for 1–3 speakers, but for panel discussions, interviews, or multi-character audio, providing an explicit count improves speaker separation quality significantly.

Use start_time and end_time for iteration. When testing output quality on long-form video, dub a representative 2–3 minute segment first before committing to full-file processing.

Keep drop_background_audio: false for most content. ElevenLabs Dubbing's ability to retain background music is a core differentiator — disabling it is best reserved for clean voiceover or podcast-only content.

Enable highest_resolution: true when dubbing video destined for broadcast, YouTube, or professional distribution.

Avoid CSV/manual mode in production. The manual mode with custom CSV transcripts is experimental and better suited for testing edge cases, not live pipelines.


FAQs

How long does dubbing take for a 30-minute video? Processing is asynchronous and scales with content length. A 30-minute video can take several minutes to process. Use the status polling endpoint to check job completion — avoid setting fixed timeouts.

Which audio and video formats are supported? The API accepts MP3, MP4, and most common audio/video formats via URL. You can also pass direct URLs from YouTube, TikTok, or cloud storage buckets.

Does voice cloning work for all 29 languages? Yes — voice cloning is applied by default across all supported languages using the Multilingual v2 model. Set disable_voice_cloning: true if you prefer generic ElevenLabs library voices instead.

What happens to background music during dubbing? By default, background audio (music, ambient sound, SFX) is separated and re-layered into the dubbed output. Set drop_background_audio: true only if you want a clean speech-only track.

Can I target a specific accent for dubbed voices? The target_accent parameter (e.g., "american", "british") is available but experimental. It's not recommended for production use and may produce inconsistent results across languages.

Is there a character or length limit? The API applies a character limit of approximately 3,000 characters per minute of content. Plan your content segmentation accordingly for very long or text-dense files.