1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const axios = require('axios');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/bytedance-humo";
const data = {
"frames": 30,
"scale_a": 5,
"scale_t": 5,
"mode": "TA",
"height": 720,
"width": 1280,
"steps": 30
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();Number of frames for the generated video
min : 10,
max : 100
Strength of audio guidance. Higher = better audio-motion sync
min : 1,
max : 10
Strength of text guidance. Higher = better adherence to text prompts
min : 1,
max : 10
Input mode: TA for text+audio; TIA for text+image+audio.
Allowed values:
Video height (e.g., 720 or 480).
Allowed values:
Video width (e.g., 1280 or 832).
Allowed values:
min : 1,
max : 100
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Edited by Segmind Team on September 14, 2025.
HuMo, created by ByteDance Research, is an advanced AI-based model designed to create premium human-centric videos. It efficiently supports multiple input formats, rendering impressive visual fidelity from text prompts, reference images, and audio inputs, while providing users with ample control over the workflow. HuMo produces videos up to 1080p resolution, capable of capturing the finest details in customized character animations,audio-synchronized performances, and even human interactions based on specific scenarios. Humo AI is an ideal model for video generation in various essential real-world applications.
Q: How long can HuMo-generated videos be? A: Using the 'duration' parameter, it is possible to create videos ranging from short 10-second clips to 60-second sequences.
Q: Can I control the background environment? A: Yes, using the 'scene' parameter, you can specify environments like 'city' or 'beach' for a contextual setting.
Q: What frame rates are supported? A: HuMo supports 30fps (standard) and 60fps (smooth motion) frame rates.
Q: How does emotion control work? A: The 'emotion' parameter allows you to specify expressions like 'happy' or 'neutral', to control the subject's facial expressions and body language.
Q: Can I combine multiple input types? A: Yes, HuMo's multimodal architecture allows a combination of text, image, and audio inputs for maximum creative control.