POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/tts-eleven-labs";

const data = {
  "prompt": "In today's fast-paced world, many of us find ourselves racing against time. We're always planning, worrying, or reminiscing.",
  "voice": "Rachel",
  "voice_id": "21m00Tcm4TlvDq8ikWAM",
  "model_id": "eleven_multilingual_v2",
  "stability": 0.5,
  "use_speaker_boost": true,
  "similarity_boost": 0.75,
  "style": 0,
  "speed": 1,
  "seed": 0,
  "apply_text_normalization": "auto",
  "apply_language_text_normalization": false
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

promptstr *

A text to get the audio output

voiceenum:str ( default: Rachel )

Voice name

Allowed values:

voice_idstr ( default: 21m00Tcm4TlvDq8ikWAM )

ElevenLabs voice ID (e.g., '21m00Tcm4TlvDq8ikWAM'). If not provided, voice parameter will be used.

model_idenum:str ( default: eleven_multilingual_v2 )

Model identifier

Allowed values:

language_codeenum:str ( default: 1 )

Language code (ISO 639-1) to enforce a language for the model

Allowed values:

stabilityfloat ( default: 0.5 )

Voice stability (0-1). Lower values introduce broader emotional range.

min : 0,

max : 1

use_speaker_boostboolean ( default: true )

Boosts similarity to the original speaker

similarity_boostfloat ( default: 0.75 )

How closely the AI should adhere to the original voice (0-1)

min : 0,

max : 1

stylefloat ( default: 1 ) Affects Pricing

Style exaggeration of the voice (0-1)

min : 0,

max : 1

speedfloat ( default: 1 )

Adjusts the speed of the voice. 1.0 is default speed.

min : 0.25,

max : 4

seedint ( default: 1 )

Seed for deterministic sampling (0-4294967295)

min : 0,

max : 4294967295

apply_text_normalizationenum:str ( default: auto )

Controls text normalization (auto/on/off)

Allowed values:

apply_language_text_normalizationboolean ( default: 1 )

Controls language text normalization. Currently only supported for Japanese.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Eleven Labs Text-to-Speech

Eleven Labs Text-to-Speech (TTS) harnesses the power of deep learning to create realistic and engaging synthetic speech from written text. This user-friendly platform caters to a broad range of applications, including content creation, eLearning development, and marketing materials.

Key Features of Eleven Labs Text-to-Speech

Natural-sounding Speech Synthesis: Produce high-quality audio that closely resembles human speech patterns, enhancing listener engagement.
Customizable Voice Selection: Choose from a library of diverse voices with varying accents, genders, and speaking styles for tailored audio experiences.
Advanced Emotional Control: Inflect the synthetic speech with desired emotions for impactful storytelling, presentations, or educational content.
Seamless Integration: Integrate Eleven Labs TTS with existing workflows through their API for efficient text-to-speech conversion.
Speaker Diarization: Automatically identify and differentiate between multiple speakers within a text script, ideal for generating audio dialogues or audiobooks.

Benefits of Utilizing Eleven Labs Text-to-Speech

Enhanced Content Creation: Generate high-quality voiceovers or audio narration for videos, presentations, and eLearning modules.
Improved Accessibility: Create audio descriptions or convert text-based content into spoken format for visually impaired audiences.
Streamlined Marketing Efforts: Produce engaging audio ads or product demonstrations for increased reach and brand awareness.
Multilingual Content Development: Generate multilingual audio content with natural-sounding voices to expand your global audience.
Realistic Voice Prototyping: Experiment with different voice styles and emotions to test the impact of your text content before final production.

Popular Models

SDXL Img2Img SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

Faceswap V2 Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

Codeformer CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.

Faceswap Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training