POST
javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 const axios = require('axios'); const api_key = "YOUR API-KEY"; const url = "https://api.segmind.com/v1/elevenlabs-audio-isolation"; const data = { "audio_url": "https://segmind-resources.s3.amazonaws.com/output/0e73c912-e046-4b2e-8b6e-62a26e945c60-denoise-before.wav" }; (async function() { try { const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } }); console.log(response.data); } catch (error) { console.error('Error:', error.response.data); } })();
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


audio_urlstr *

Specify the direct URL of the audio to process. Use clear audio for better model performance.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

ElevenLabs Voice Isolator: AI-Powered Speech Extraction API

What is ElevenLabs Voice Isolator?

ElevenLabs Voice Isolator is an AI-powered audio processing model that extracts crystal-clear speech from audio and video files by intelligently removing background noise, music, and environmental interference. Built on advanced machine learning algorithms, this model specializes in dialogue separation, transforming noisy recordings into professional-quality voice audio. Whether you're processing podcast episodes with music beds, cleaning up interview recordings, or enhancing telephony audio, Voice Isolator ensures every word remains intelligible and clear. The model integrates seamlessly via API, making it accessible for developers building audio enhancement workflows, content management systems, or real-time voice processing applications.

Key Features

  • Intelligent Speech Separation: Uses deep learning to distinguish human voice from complex background audio layers
  • Multi-Format Support: Processes both audio and video files through direct URL input
  • Professional Audio Quality: Delivers broadcast-standard voice clarity suitable for production environments
  • Simple API Integration: Single-parameter API design (audio_url) for straightforward implementation
  • Real-Time Processing: Fast isolation suitable for production pipelines and user-facing applications
  • Universal Accessibility: Works with various audio qualities and recording environments

Best Use Cases

Content Creation: Podcasters and YouTubers can salvage recordings with unwanted background music or environmental noise, eliminating expensive re-recording sessions.

Transcription Services: Improve speech-to-text accuracy by feeding clean, isolated voice audio to transcription models, reducing error rates in noisy recordings.

Telephony and Call Centers: Enhance voice call quality for better customer experience and improved sentiment analysis accuracy.

Video Production: Isolate dialogue tracks during post-production, enabling separate processing of voice and background elements for professional mixing.

Journalism and Interviews: Clean up field recordings and interviews conducted in challenging acoustic environments like busy streets or crowded venues.

Prompt Tips and Output Quality

Input Optimization: Use high-quality source audio when possible. While the model handles low-quality inputs, clearer source material yields superior isolation results. Direct audio file URLs work best—ensure files are accessible without authentication requirements.

URL Requirements: Provide direct-access URLs (e.g., S3, CDN, or public hosting). The model cannot process password-protected or session-based URLs.

Format Considerations: The model accepts standard audio/video formats. For best results, use uncompressed or lightly compressed audio (WAV, FLAC, high-bitrate MP3).

Expected Behavior: The model prioritizes human speech, removing instrumental music, ambient noise, and non-vocal sounds while preserving natural voice characteristics and tonal quality.

FAQs

Is ElevenLabs Voice Isolator available as an open-source model?
No, Voice Isolator is proprietary technology developed by ElevenLabs, accessible exclusively through their API.

How does it differ from traditional noise reduction tools?
Unlike frequency-based noise gates, Voice Isolator uses AI to understand speech patterns and context, enabling intelligent separation of voice from complex audio environments including music and overlapping sounds.

What audio formats are supported?
The model accepts any standard audio or video format accessible via URL, including MP3, WAV, M4A, MP4, and other common formats.

Can it separate multiple speakers?
Voice Isolator focuses on extracting all speech from background noise rather than separating individual speakers. It outputs a single isolated voice track containing all dialogue.

What happens if the source audio has very low quality?
The model performs best with clear source audio but can still improve heavily degraded recordings. Results depend on signal-to-noise ratio in the original file.

Does it work with real-time streaming audio?
The current API processes pre-recorded files via URL. For real-time applications, implement frame buffering to process audio in chunks.