Veena, developed by Maya Research, is a state-of-the-art text-to-speech (TTS) model built on a 3 billion-parameter Llama-based autoregressive transformer. It delivers natural, expressive speech in Hindi and English—handling mixed-language inputs seamlessly. Leveraging the SNAC neural codec at 24 kHz, Veena generates studio-quality audio with four distinct speaker personas (Kavya, Agastya, Maitri, Vinaya). Optimized for ultra-low latency (sub-80 ms on high-end GPUs) and production-ready deployment via 4-bit quantization, Veena is engineered for real-time applications in accessibility, customer service, content creation, and voice-enabled devices.
speaker
):
temperature
(0–2): 0.2 for monotone, 0.7 for lively expressivenesstop_p
(0–1): 0.5 for focused delivery, 0.95 for varied intonationrepetition_penalty
(1–2): 1.05 default; increase to 1.2 to minimize repeatsCan Veena handle Hindi-English code-switching?
Yes. Veena’s transformer backbone is trained on mixed-language corpora for seamless transitions.
What latency should I expect in production?
On high-end GPUs, Veena delivers sub-80 ms end-to-end latency—perfect for real-time use.
How do I pick the best speaker voice?
Choose based on your brand or application tone: Kavya for warmth, Agastya for depth, Maitri for neutrality, Vinaya for energy.
Is a quantized version available?
Absolutely. Veena supports 4-bit quantization for reduced memory usage and faster inference.
What sample rate does Veena output?
Audio is synthesized at 24 kHz using the SNAC neural codec for smooth, high-quality playback.