POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/elevenlabs-voice-design";

const data = {
  "voice_description": "A calm young adult female with a warm and smooth voice tone, speaking with a neutral American accent. The pacing is relaxed and conversational, with perfect audio quality. She sounds thoughtful and gentle, conveying kindness and clarity",
  "model_id": "eleven_ttv_v3",
  "auto_generate_text": true,
  "loudness": 0.5,
  "seed": 123456,
  "guidance_scale": 5
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

voice_namestr *

The name for the newly created voice. Suggest fun or memorable names for personal use; formal for business.

voice_descriptionstr *

Describes the voice attributes. Use warm tones for audiobooks; energetic for commercials.

model_idenum:str ( default: eleven_multilingual_ttv_v2 )

Choose the model for voice generation. Use 'v2' for multilingual; 'v3' for advanced features.

Allowed values:

textstr ( default: 1 )

Input text for voice generation. Use descriptive, engaging text for storytelling.

auto_generate_textboolean ( default: true )

Generates text matching voice description. Useful for quick demonstrations or testing.

loudnessfloat ( default: 0.5 )

Sets voice volume level. Use -0.5 for quiet settings; 0.5 for moderate volume.

min : -1,

max : 1

seedint ( default: 123456 )

Controls voice generation randomness. Use a consistent seed for stability or varying for diversity.

min : 0,

max : 2147483647

guidance_scalefloat ( default: 5 )

Adjusts AI adherence to prompts. Use 3 for flexibility; 8 for strict adherence.

min : 0,

max : 100

qualityfloat ( default: 1 )

Determines voice output quality. 0.5 for standard quality; 1 for high quality.

min : -1,

max : 1

reference_audio_urlstr ( default: 1 )

Use a URL for reference audio with 'v3' model. Utilize for precise tone matching.

prompt_strengthfloat ( default: 1 )

Balances prompt vs. reference audio influence. Use 0.3 for reference focus; 0.7 for prompt focus.

min : 0,

max : 1

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Voice Design by ElevenLabs: AI Voice Generation Model

What is Voice Design?

Voice Design is a generative AI model from ElevenLabs that creates fully synthetic voices from scratch—no voice samples required. Instead of browsing voice libraries, you describe what you want: gender, age, accent, tone, and mood. The model then generates a completely unique, lifelike voice tailored to your specifications. Each output is distinct, with subtle randomness ensuring endless variety. This makes Voice Design ideal for creators, game developers, publishers, and brands seeking custom audio identities without licensing constraints or recording sessions.

Key Features

Text-to-voice generation: Generate synthetic voices by describing attributes like age, accent, and tone
No voice samples needed: Create entirely artificial voices without real recordings
Multilingual support: Choose between eleven_multilingual_ttv_v2 for broad language coverage or eleven_ttv_v3 for advanced features
Reproducible outputs: Use seed values for consistent voice generation or vary them for diversity
Reference audio support: With the v3 model, upload reference audio to guide tone and style
Customizable parameters: Fine-tune loudness, quality, guidance scale, and prompt strength for precise control
Metadata tagging: Add labels like genre and mood for easy voice organization and reuse

Best Use Cases

Audiobook narration: Create distinct character voices or a consistent narrator persona across series
Gaming NPCs: Design diverse NPC voices without hiring multiple voice actors
Branded audio: Develop unique brand voice identities for ads, IVR systems, or podcasts
Content localization: Generate native-sounding voices for different languages and regional accents
Rapid prototyping: Test narration styles before committing to professional voice recording
Educational content: Produce engaging, varied instructional audio without budget constraints

Prompt Tips and Output Quality

Writing Effective Voice Descriptions: Be specific about age range, gender, accent, tone (warm, energetic, authoritative), and intended use case. For example: "A warm, middle-aged female voice with a British accent, ideal for cozy audiobook narration" yields better results than "female voice."

Parameter Impact:

Guidance Scale (3-8): Lower values (3-4) give the AI creative freedom; higher values (7-8) enforce strict adherence to your description
Quality (0.5-1.0): Use 0.8+ for production work; 0.5 for quick drafts and testing
Seed: Keep consistent for reproducible voices; change for variations on the same description
Prompt Strength (0-1): When using reference audio (v3 model), lower values (0.3) prioritize the reference; higher values (0.7) emphasize your text description

Auto-generate text is useful for quick voice previews, but custom text (100-1000 characters) showcases voice nuance better.

FAQs

Is Voice Design open-source?
No, Voice Design is a proprietary model by ElevenLabs, accessible via API integration.

What's the difference between the v2 and v3 models?
eleven_multilingual_ttv_v2 offers broad multilingual support. eleven_ttv_v3 adds reference audio capabilities, letting you guide voice generation with sample audio files.

Can I reuse a generated voice?
Yes. Save the voice name and seed value to recreate the same voice. Use labels (metadata tags) to organize voices by project or use case.

How do I match a specific tone without reference audio?
Use detailed descriptions in the voice_description parameter. Combine adjectives like "warm," "energetic," "authoritative," or "playful" with use-case context (e.g., "ideal for commercials").

What text length works best for voice generation?
Minimum 100 characters, maximum 1000. Longer, varied text (100+ words) reveals the voice's full expressive range better than short phrases.

Can I adjust volume after generation?
Yes, but the loudness parameter (-1 to 1) controls output volume during generation. Use -0.5 for quieter scenes, 0.5 for standard audio levels.

Popular Models

SDXL Controlnet SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process

SDXL Inpaint This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

Codeformer CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.

Faceswap Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training