POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/veena-max-tts";

const data = {
  "text": "Segmind lagao, model chalao, itna tez ki result aane se pehle chai bhi tthandi na ho.",
  "speaker_id": "vinaya_assist",
  "normalize": true
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

textstr *

Provide the text to convert into speech. Use greetings or instructions, like 'Welcome to VeenaMAX, your TTS solution.'

speaker_idenum:str *

Choose a voice for your text. For a calm tone, select 'soumya_calm'; for impact, select 'agastya_impact'.

Allowed values:

normalizebool ( default: true )

Enable text normalization for better pronunciation. Use this for complex texts or mixed languages.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

VeenaMAX: Text-to-Speech Model for Indian Languages

Edited by Segmind Team on September 20, 2025.

What is VeenaMAX?

VeenaMAX by Maya Research is a high-performance Text-to-Speech (TTS) model designed for Indian languages and multi-script content. It functions with high accuracy by transforming written text into natural, humanized speech enriched with expressive touch in tonality. It effectively supports Hindi (Devanagari and Roman), English, as well as more conversational Hinglish. The built-in emotional intelligence feature with 8 distinct voice personalities makes it an excellent model for several industries, especially with its super-fast processing power useful in real-time streaming for interactive applications, and audio output.

Key Features VeenaMAX

Multi-script support: It supports Hindi in both Devanagari and Roman scripts, along with English and Hinglish texts.
8 distinct voice personalities: It is integrated with eight unique voices, each with its own emotional tone for multiple applications.
Automatic script detection: It can identify the scripts automatically and utilize the relevant language for correct pronunciation.
Real-time and non-streaming output: It works well for real-time streaming as well as non-streaming audio to create and save the full file before playing.
Studio-quality audio: It produces clear, professional, and noise-free audio.
Context-aware pronunciation: It can regulate speech based on context and smoothly switch between languages.
Domain-specific terminology: It can accurately handle audio generation required for sector-specific terminology for industries such as banking, finance, and healthcare.

Best Use Cases

IVR Systems: It can be utilized for industries that require Interactive Voice Response systems.
Customer Support: It can efficiently handle real-time calls and provide unparalleled customer support in call centers.
Live Language Translation Services: It can instantly translate spoken or written languages.
Educational: It can support voice narration for E-learning platforms and educational content.
Banking and Financial: It is an excellent option for service applications in banking and finance that require secure voice-based banking and alerts.
Healthcare: Its application is useful in healthcare information systems.
Multi-language Content Creation: It can support independent content creators with its audio production in several languages.
Accessibility Solutions: It is a boon for visually impaired users as it helps with text accessibility.

Prompt Tips and Output Quality

Voice Selection

Use soumya_calm for informational content and educational material
Select agastya_impact for marketing and announcement content
Choose vinaya_assist (default) for customer service applications
Opt for charu_soft or mohini_whispers for gentle, natural conversations

Optimization Tips

Enable text normalization for mixed-language content
Structure complex sentences with proper punctuation
Use phonetic spelling for uncommon terms
Include pauses (commas, periods) for natural speech flow

FAQs

How does VeenaMAX handle mixed language (Hinglish) content?

VeenaMAX features automatic script detection and seamless code-switching (for smooth transition between languages), which is effective in natural pronunciation of mixed language content without manual intervention.

What's the difference between streaming and non-streaming modes?

Streaming mode enables real-time audio output useful for interactive applications, while non-streaming mode generates complete audio files for download or storage.

How can I optimize the voice output for my specific industry?

VeenaMAX includes a domain-specific terminology option that enables users to select appropriate voice personalities and enable text normalization for industry-specific content such as banking, healthcare, and other industries.

Which voice personality should I choose for my application?

You can select the voice personality based on your use case: soumya_calm for professional content, agastya_impact for engaging announcements, vinaya_assist for customer service, and other options for specific emotional tones.

How does the text normalization feature work?

Text normalization automatically customizes pronunciation, numbers, and special characters for a natural speech output. It is highly effective for multi-language content and complex terminology.

Popular Models

SadTalker Audio-based Lip Synchronization for Talking Head Video

Fooocus Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney.

face-to-many Turn a face into 3D, emoji, pixel art, video game, claymation or toy

Faceswap Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training