1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
const axios = require('axios');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/veena-tts";
const data = {
"text": "Kya tumne kabhi socha hai... ki hum sab sirf waqt ke musafir hain?",
"speaker": "kavya",
"temperature": 0.4,
"top_p": 0.9,
"repetition_penalty": 1.05
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();
Provide input text for speech synthesis. Use simple phrases for clarity, complex for detailed expressions.
Choose speaker for voice style. Kavya for warmth, Agastya for depth.
Set speech variation. Use 0.2 for monotone, 0.7 for lively expression.
min : 0,
max : 2
Control output randomness. Set 0.5 for focused, 0.95 for diverse speech.
min : 0,
max : 1
Minimize word repetition. Use 1.2 for minimal repeats.
min : 1,
max : 2
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Veena, developed by Maya Research, is a state-of-the-art text-to-speech (TTS) model built on a 3 billion-parameter Llama-based autoregressive transformer. It delivers natural, expressive speech in Hindi and English—handling mixed-language inputs seamlessly. Leveraging the SNAC neural codec at 24 kHz, Veena generates studio-quality audio with four distinct speaker personas (Kavya, Agastya, Maitri, Vinaya). Optimized for ultra-low latency (sub-80 ms on high-end GPUs) and production-ready deployment via 4-bit quantization, Veena is engineered for real-time applications in accessibility, customer service, content creation, and voice-enabled devices.
speaker
):
temperature
(0–2): 0.2 for monotone, 0.7 for lively expressivenesstop_p
(0–1): 0.5 for focused delivery, 0.95 for varied intonationrepetition_penalty
(1–2): 1.05 default; increase to 1.2 to minimize repeatsCan Veena handle Hindi-English code-switching?
Yes. Veena’s transformer backbone is trained on mixed-language corpora for seamless transitions.
What latency should I expect in production?
On high-end GPUs, Veena delivers sub-80 ms end-to-end latency—perfect for real-time use.
How do I pick the best speaker voice?
Choose based on your brand or application tone: Kavya for warmth, Agastya for depth, Maitri for neutrality, Vinaya for energy.
Is a quantized version available?
Absolutely. Veena supports 4-bit quantization for reduced memory usage and faster inference.
What sample rate does Veena output?
Audio is synthesized at 24 kHz using the SNAC neural codec for smooth, high-quality playback.