POST
javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 const axios = require('axios'); const fs = require('fs'); const path = require('path'); async function toB64(imgPath) { const data = fs.readFileSync(path.resolve(imgPath)); return Buffer.from(data).toString('base64'); } const api_key = "YOUR API-KEY"; const url = "https://api.segmind.com/v1/infinite-talk"; const data = { "prompt": "A woman sits quietly, gazing at the sunset.", "image": "toB64('https://segmind-resources.s3.amazonaws.com/input/601140c8-73e5-4490-8911-e6c7d3dc0e70-infinite_talk_ip.png')", "audio": "https://segmind-resources.s3.amazonaws.com/input/aa5166b3-a78d-460f-a23c-9d3c5a4deb11-ce0922b2-dd13-4946-bb70-9512f023a18b.mp3", "seed": 42424242, "resolution": "480p", "fps": 25, "base64": false }; (async function() { try { const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } }); console.log(response.data); } catch (error) { console.error('Error:', error.response.data); } })();
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


promptstr *

The scene or action description guides the model. Try describing vivid emotions or actions for dynamic results.


imageimage *

URL to an image input that will be used by the model. Choose detailed images for complex scenes.


audiostr *

URL to an audio file to synchronize with the model. Short clips work well for testing.


seedint ( default: 42424242 )

Random seed ensures reproducibility of output. Use different seeds for varied results.


resolutionenum:str ( default: 480p )

Defines video output quality. Higher resolution for detailed visuals, use 480p for quick previews.

Allowed values:


fpsint ( default: 25 )

Frames per second of the output. Select higher FPS for smoother animations.

min : 16,

max : 30


base64bool ( default: 1 )

Determines if output should be encoded in Base64. Use false for direct file outputs.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

InfiniteTalk: Audio-Driven Video Generation Model

Edited by Segmind Team on October 22, 2025.

What is InfiniteTalk?

InfiniteTalk is a highly sophisticated AI model that significantly improves upon video dubbing by creating full-body movements that sync perfectly with the audio. Compared to the commonly available dubbing tools that only target and change mouth movements, InfiniteTalk supports natural and holistic animations while preserving the original video’s persona and precisely matches the audio. This next-gen model can render video-to-video and image-to-video outputs, making it excellent for creative projects.

Key Features of InfiniteTalk

  • It supports full-body motion synthesis synchronized with audio input
  • It ensures seamless preservation of video identity and background elements
  • It can render video-to-video and image-to-video outputs
  • It has a streaming generator architecture for smooth, continuous sequences
  • It includes fine-grained reference frame sampling for precise motion control
  • It has adjustable output quality with resolution options (480p to 720p)
  • It provides customizable frame rates (16-30 FPS) for optimal animation smoothness

Best Use Cases

  • Content localization and video dubbing
  • Virtual presenter creation from still images
  • Educational content adaptation across languages
  • Corporate training video personalization
  • Social media content creation and modification
  • Virtual influencer animation
  • Live streaming avatar animation

Prompt Tips and Output Quality

  • Provide a detailed and clear description of emotions and actions in your prompts; for example, "A woman speaks enthusiastically, gesturing with confidence."
  • Use high-quality source images for better detail retention in the output
  • During the initial phase of learning to use the model, start with shorter audio clips: 5-15 seconds
  • If you need smoother animations, go with a higher FPS (25-30); it may take more time to process the video
  • Use 480p resolution for testing before final outputs and for quick iterations
  • Maintain consistent lighting and composition in source materials to ensure better results

FAQs

How is InfiniteTalk different from traditional dubbing models? InfiniteTalk seamlessly generates full-body movements with perfectly synchronized audio, while traditional models only modify mouth movements. Additionally, it creates natural and comprehensive physical motion while preserving video identity.

What input formats does InfiniteTalk support? The InfiniteTalk accepts image and video inputs, along with audio files for synchronization. It works flawlessly with common image formats and standard audio files.

How can I achieve the best animation quality? To generate high-quality results, use high-resolution source materials, clear prompts describing desired emotions/actions, and higher FPS settings (25-30). You can start with 480p for testing before moving to higher resolutions.

Can I control the randomness of the animations? Yes, using the seed parameter will give reproducible results. If you change the seed value, you can explore different animation variations while preserving other parameters.

What's the recommended workflow for testing and production? Start with short audio clips and 480p resolution for quick iterations during the testing phase. Once you can precisely control the results, increase resolution and FPS for the final output. Additionally, use detailed prompts to guide the animation style.