POST
javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 const axios = require('axios'); const api_key = "YOUR API-KEY"; const url = "https://api.segmind.com/v1/tts-eleven-labs"; const data = { "prompt": "Unlock endless possibilities with voice synthesis, bringing text to life with realism and emotion.", "voice": "Rachel", "voice_id": "21m00Tcm4TlvDq8ikWAM", "model_id": "eleven_multilingual_v2", "stability": 0.5, "use_speaker_boost": true, "similarity_boost": 0.75, "style": 0, "speed": 1, "seed": 0, "apply_text_normalization": "auto", "apply_language_text_normalization": false }; (async function() { try { const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } }); console.log(response.data); } catch (error) { console.error('Error:', error.response.data); } })();
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


promptstr *

The text to be converted to audio. Craft engaging content to leverage audio features.


voiceenum:str ( default: Rachel )

Select the voice persona for audio. Try different voices for varied audience impact.

Allowed values:


voice_idstr ( default: 21m00Tcm4TlvDq8ikWAM )

Identifier for specific audio voice. Adjust when using custom voice models.


model_idenum:str ( default: eleven_multilingual_v2 )

Model selection for TTS conversion. Use multilingual for diverse language support.

Allowed values:


language_codeenum:str ( default: 1 )

Enforce language in ISO 639-1 format. Choose based on target audience linguistic needs.

Allowed values:


stabilityfloat ( default: 0.5 )

Control voice emotional stability. Use lower values for more variance in emotion.

min : 0,

max : 1


use_speaker_boostboolean ( default: true )

Boost speaker likeness for more realistic voices. Enable to match original speaker.


similarity_boostfloat ( default: 0.75 )

Adjust voice fidelity to the original. Higher values ensure closer match to the voice.

min : 0,

max : 1


stylefloat ( default: 1 ) Affects Pricing

Define the stylistic tone of the voice. Increase for dramatic delivery.

min : 0,

max : 1


speedfloat ( default: 1 )

Adjust the voice reading speed. Use 1.0 for normal pace; modify for speed variation.

min : 0.25,

max : 4


seedint ( default: 1 )

Set a seed to ensure reproducibility. Use 0 for non-deterministic output.

min : 0,

max : 4294967295


apply_text_normalizationenum:str ( default: auto )

Decide on text normalization. Auto is preferred for balanced results.

Allowed values:


apply_language_text_normalizationboolean ( default: 1 )

Specific to Japanese text normalization. Enable for correct text processing.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

ElevenLabs TTS: Text-to-Speech Model

What is ElevenLabs TTS?

ElevenLabs TTS is a family of AI text-to-speech models that converts written text into natural, expressive, human-like speech. It’s designed for developers building anything from long-form narration to real-time conversational experiences, with strong prosody, contextual delivery, and multilingual output.

On Segmind, you can select from multiple ElevenLabs model IDs depending on your latency and quality needs: eleven_v3 for rich emotional delivery and dialogue, eleven_multilingual_v2 for consistent long-form narration across many languages, and eleven_flash_v2_5 / eleven_turbo_v2_5 for low-latency TTS in interactive apps. You can also choose a preset voice (like “Rachel”) or provide a specific voice_id for tighter control.

Key Features

  • High-fidelity speech synthesis with expressive intonation and natural pacing
  • Multiple model options for quality vs latency (v3, multilingual v2, flash/turbo)
  • Voice selection via preset voices or explicit voice_id
  • Multilingual TTS with optional language_code (ISO 639-1)
  • Controllable prosody using stability, similarity_boost, style, and speed
  • Reproducible outputs with seed for consistent generations

Best Use Cases

  • Audiobooks and long-form narration (consistent tone across chapters)
  • Podcast intros, ads, and video voiceovers (clean, studio-like delivery)
  • Real-time conversational AI and IVR (low latency with flash/turbo)
  • Games and character dialogue (distinct voices + dramatic style control)
  • Accessibility and assistive reading tools (clear pacing and pronunciation)
  • Social content localization (multilingual voiceovers at scale)

Prompt Tips and Output Quality

  • Write as you want it spoken: short sentences, intentional punctuation, and paragraph breaks.
  • For more emotional variance, lower stability (e.g., 0.2–0.4). For steady narration, raise it (0.6–0.9).
  • Increase style for more dramatic delivery; keep near 0 for neutral reads.
  • Use similarity_boost (e.g., 0.7–0.9) to keep the voice closer to the target persona.
  • Adjust speed (1.0 is natural; 0.85 for gravitas; 1.1–1.3 for energetic explainers).
  • If you need consistent results in testing, set a fixed seed (use 0 for non-deterministic).

FAQs

Is ElevenLabs TTS open-source?
No. These are proprietary text-to-speech models exposed via API-style parameters on Segmind.

Which model_id should I choose?
Use eleven_multilingual_v2 for long-form multilingual narration, eleven_v3 for expressive performances, and eleven_flash_v2_5 / eleven_turbo_v2_5 for low-latency, real-time TTS.

How do I pick between voice and voice_id?
Use voice for quick preset selection. Use voice_id when you need a specific exact voice identity (including custom voices).

What parameters most affect realism?
Start with stability, similarity_boost, and use_speaker_boost. Then tune style and speed for delivery.

Should I set language_code?
Set language_code to force a target language (e.g., en, es, ja) when your text or app supports multiple locales.

What does text normalization do?
apply_text_normalization (auto/on/off) controls how numbers, dates, and abbreviations are expanded for speech; leave auto for most apps.