POST

javascript

const axios = require('axios');

const fs = require('fs');
const path = require('path');

async function toB64(imgPath) {
    const data = fs.readFileSync(path.resolve(imgPath));
    return Buffer.from(data).toString('base64');
}

const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/infinite-talk";

const data = {
  "prompt": "A woman sits quietly, gazing at the sunset.",
  "image": "toB64('https://segmind-resources.s3.amazonaws.com/input/601140c8-73e5-4490-8911-e6c7d3dc0e70-infinite_talk_ip.png')",
  "audio": "https://segmind-resources.s3.amazonaws.com/input/aa5166b3-a78d-460f-a23c-9d3c5a4deb11-ce0922b2-dd13-4946-bb70-9512f023a18b.mp3",
  "seed": 42424242,
  "resolution": "480p",
  "fps": 25,
  "base64": false
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

promptstr *

The scene or action description guides the model. Try describing vivid emotions or actions for dynamic results.

imageimage *

URL to an image input that will be used by the model. Choose detailed images for complex scenes.

audiostr *

URL to an audio file to synchronize with the model. Short clips work well for testing.

seedint ( default: 42424242 )

Random seed ensures reproducibility of output. Use different seeds for varied results.

resolutionenum:str ( default: 480p )

Defines video output quality. Higher resolution for detailed visuals, use 480p for quick previews.

Allowed values:

fpsint ( default: 25 )

Frames per second of the output. Select higher FPS for smoother animations.

min : 16,

max : 30

base64bool ( default: 1 )

Determines if output should be encoded in Base64. Use false for direct file outputs.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

InfiniteTalk: Audio-Driven Video Generation Model

Edited by Segmind Team on October 22, 2025.

What is InfiniteTalk?

InfiniteTalk is a highly sophisticated AI model that significantly improves upon video dubbing by creating full-body movements that sync perfectly with the audio. Compared to the commonly available dubbing tools that only target and change mouth movements, InfiniteTalk supports natural and holistic animations while preserving the original video’s persona and precisely matches the audio. This next-gen model can render video-to-video and image-to-video outputs, making it excellent for creative projects.

Key Features of InfiniteTalk

It supports full-body motion synthesis synchronized with audio input
It ensures seamless preservation of video identity and background elements
It can render video-to-video and image-to-video outputs
It has a streaming generator architecture for smooth, continuous sequences
It includes fine-grained reference frame sampling for precise motion control
It has adjustable output quality with resolution options (480p to 720p)
It provides customizable frame rates (16-30 FPS) for optimal animation smoothness

Best Use Cases

Content localization and video dubbing
Virtual presenter creation from still images
Educational content adaptation across languages
Corporate training video personalization
Social media content creation and modification
Virtual influencer animation
Live streaming avatar animation

Prompt Tips and Output Quality

Provide a detailed and clear description of emotions and actions in your prompts; for example, "A woman speaks enthusiastically, gesturing with confidence."
Use high-quality source images for better detail retention in the output
During the initial phase of learning to use the model, start with shorter audio clips: 5-15 seconds
If you need smoother animations, go with a higher FPS (25-30); it may take more time to process the video
Use 480p resolution for testing before final outputs and for quick iterations
Maintain consistent lighting and composition in source materials to ensure better results

FAQs

How is InfiniteTalk different from traditional dubbing models? InfiniteTalk seamlessly generates full-body movements with perfectly synchronized audio, while traditional models only modify mouth movements. Additionally, it creates natural and comprehensive physical motion while preserving video identity.

What input formats does InfiniteTalk support? The InfiniteTalk accepts image and video inputs, along with audio files for synchronization. It works flawlessly with common image formats and standard audio files.

How can I achieve the best animation quality? To generate high-quality results, use high-resolution source materials, clear prompts describing desired emotions/actions, and higher FPS settings (25-30). You can start with 480p for testing before moving to higher resolutions.

Can I control the randomness of the animations? Yes, using the seed parameter will give reproducible results. If you change the seed value, you can explore different animation variations while preserving other parameters.

What's the recommended workflow for testing and production? Start with short audio clips and 480p resolution for quick iterations during the testing phase. Once you can precisely control the results, increase resolution and FPS for the final output. Additionally, use detailed prompts to guide the animation style.

Popular Models

illusion-diffusion-hq Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1

SDXL Inpaint This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

Codeformer CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.

Faceswap Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training