1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
const axios = require('axios');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/chatterbox-tts";
const data = {
"text": "Welcome to Chatterbox TTS, where your text turns into captivating audio effortlessly.",
"reference_audio": "https://segmind-resources.s3.amazonaws.com/input/ef2a2b5c-3e3a-4051-a437-20a72bf175de-sample_audio.mp3",
"exaggeration": 0.5,
"temperature": 0.8,
"seed": 42,
"cfg_weight": 0.5,
"min_p": 0.05,
"top_p": 1,
"repetition_penalty": 1.2
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();
The input text is synthesized into speech. Use longer text for detailed narration, shorter for concise messages.
Provides a sample audio for voice style matching
Adjusts speech expressiveness. Use lower values for neutrality, higher for dramatic effect.
min : 0,
max : 2
Controls speech variation. Use lower for consistent tone, higher for diverse expressions.
min : 0,
max : 2
Ensures consistent output with the same input. Adjust for diverse generations.
Balances creativity and adherence to text. Use lower for strict interpretation, higher for flexibility.
min : 0,
max : 2
Ensures minimum probability for content inclusion. Useful for removing unlikely phrases.
min : 0,
max : 1
Determines output randomness. Lower for focused content, higher for creative diversity.
min : 0,
max : 1
Penalizes repeated words in speech. Higher values reduce redundancy.
min : 1,
max : 2
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Chatterbox is an open-source, high-fidelity text-to-speech (TTS) model developed by Resemble AI. Built on a 0.5 billion-parameter Llama backbone, it transforms plain text into natural, expressive speech. Trained on 0.5 million hours of cleaned audio, Chatterbox leverages alignment-informed synthesis to maintain precise lip-sync and timing. Unique to Chatterbox is its emotion exaggeration control, enabling developers to dial up or tone down expressiveness for dramatic narration, character voices, and dynamic AI agents. Outputs include a subtle watermark to promote ethical usage and traceability.
Q: How do I control emotion intensity?
Use the exaggeration
parameter: values below 0.7 tone down expression, values above 1.0 heighten drama.
Q: Can I match a custom voice?
Yes. Provide a reference_audio
URL to steer Chatterbox toward the same style and pitch.
Q: Is Chatterbox multilingual?
Chatterbox is optimized for English. Community contributions are welcome to extend language support.
Q: How does the watermark work?
An inaudible digital watermark is embedded in each output to ensure traceability and discourage misuse.
Q: Is Chatterbox open source?
Absolutely. Chatterbox’s code and model checkpoints are available under an open-source license on Resemble AI’s GitHub.