POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/bytedance-humo";

const data = {
  "frames": 30,
  "scale_a": 5,
  "scale_t": 5,
  "mode": "TA",
  "height": 720,
  "width": 1280,
  "steps": 30
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

framesint ( default: 30 )

Number of frames for the generated video

min : 10,

max : 100

scale_afloat ( default: 5 )

Strength of audio guidance. Higher = better audio-motion sync

min : 1,

max : 10

scale_tfloat ( default: 5 )

Strength of text guidance. Higher = better adherence to text prompts

min : 1,

max : 10

modeenum:str ( default: TA )

Input mode: TA for text+audio; TIA for text+image+audio.

Allowed values:

heightenum:int ( default: 720 )

Video height (e.g., 720 or 480).

Allowed values:

widthenum:int ( default: 1280 )

Video width (e.g., 1280 or 832).

Allowed values:

stepsint ( default: 30 )

min : 1,

max : 100

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

HuMo: Multimodal Human Video Generation Model

Edited by Segmind Team on September 14, 2025.

What is HuMo?

HuMo, created by ByteDance Research, is an advanced AI-based model designed to create premium human-centric videos. It efficiently supports multiple input formats, rendering impressive visual fidelity from text prompts, reference images, and audio inputs, while providing users with ample control over the workflow. HuMo produces videos up to 1080p resolution, capable of capturing the finest details in customized character animations,audio-synchronized performances, and even human interactions based on specific scenarios. Humo AI is an ideal model for video generation in various essential real-world applications.

Key Features HuMo

It is capable of creating videos using Three Flexible Generation Modes -
- Text-to-image mode for appearance and scene customization
- Text-to-audio mode for audio-driven video generation
- Full multimodal mode for combining text, image, and audio inputs
It has High-Resolution Output that supports up to 1080p video quality
It supports Adjustable Frame Rates (30fps or 60fps) for optimal motion smoothness
It is capable of Fine-grained Control over human poses, emotions, and scene settings
It is integrated with Audio Synchronization capabilities to create in-sync movement and sound
It renders videos with Consistent Subject Preservation throughout generated sequences

Best Use Cases

Content Creation: It is useful in generating custom character videos for digital content
Entertainment: It can create dynamic human performances synchronized with music
Education: It can produce instructional videos based on prompts for specific poses and movements
Marketing: It will design personalized video content with controlled environments
Virtual Production: It can generate preliminary human animations for pre-visualization to recalibrate the workflow.

Prompt Tips and Output Quality

Scene Setting: Provide prompts with clear environment descriptions (e.g., "sunny beach morning" or "busy urban street")
Pose Guidance: Specify human poses explicitly using the 'human_pose' parameter
Emotional Context: Use the 'emotion' parameter to control subject expression
Audio Integration: Match background music to your scene's mood using provided options
Resolution Balance: Choose 720p for faster generation or 1080p for highest quality

FAQs

Q: How long can HuMo-generated videos be? A: Using the 'duration' parameter, it is possible to create videos ranging from short 10-second clips to 60-second sequences.

Q: Can I control the background environment? A: Yes, using the 'scene' parameter, you can specify environments like 'city' or 'beach' for a contextual setting.

Q: What frame rates are supported? A: HuMo supports 30fps (standard) and 60fps (smooth motion) frame rates.

Q: How does emotion control work? A: The 'emotion' parameter allows you to specify expressions like 'happy' or 'neutral', to control the subject's facial expressions and body language.

Q: Can I combine multiple input types? A: Yes, HuMo's multimodal architecture allows a combination of text, image, and audio inputs for maximum creative control.

Popular Models

IDM VTON Best-in-class clothing virtual try on in the wild

InstantID InstantID aims to generate customized images with various poses or styles from only a single reference ID image while ensuring high fidelity

Majicmix The most versatile photorealistic model that blends various models to achieve the amazing realistic images.

Epic Realism This model corresponds to the Stable Diffusion Epic Realism checkpoint for detailed images at the cost of a super detailed prompt