1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
const axios = require('axios');
const fs = require('fs');
const path = require('path');
async function toB64(imgPath) {
const data = fs.readFileSync(path.resolve(imgPath));
return Buffer.from(data).toString('base64');
}
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/ltx-2-19b-i2v";
const data = {
"prompt": "A bright sunny afternoon in a backyard with an old stone wall and green bushes in the background, full‑body shot of a young woman in a blue denim jacket, graphic T‑shirt, dark skinny jeans and black sneakers, standing on the grass and starting to walk toward the camera with energetic steps, cheerful and excited, soft daylight and gentle shadows, she smiles and raises one hand as she excitedly shouts in clear English: 'L T X 2 video model is superfast, check it out its live on segmind'",
"negative_prompt": "blurry, low quality, still frame, frames, watermark, overlay, titles, has blurbox, has subtitles",
"image": "toB64('https://segmind-resources.s3.amazonaws.com/output/9f26da93-4913-49ea-8d46-65fe36844f02-Screenshot_2025-11-24_120343.png')",
"width": 720,
"height": 1280,
"num_frames": 181,
"fps": 24,
"seed": 42,
"guidance_scale": 4
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();Prompt describing the video scene
Negative prompt to avoid certain elements
Input image URL
Width of the output video
min : 256,
max : 1280
Height of the output video
min : 256,
max : 1280
Number of frames to generate
min : 1,
max : 400
Frames per second
min : 1,
max : 30
Random seed for reproducibility
Guidance scale for prompt adherence
min : 1,
max : 20
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Edited by Segmind Team on January 12, 2026.
LTX-2 stands out as a transformative AI model that is designed to create highly realistic, synchronized audio-video content. Besides being an open-source model, it is truly efficient with the ability to generate up to 20 seconds of native 4K video at 50 frames per second, including precise lip-sync, fluid natural movements, and seamless audio; all this is achieved within one streamlined process. What sets it apart from most closed-source competitors is its ability to run effortlessly on modern consumer GPUs, empowering developers, researchers, and creators everywhere to effortlessly create professional-quality multimodal content. LTX-2 supports open architecture that includes full weights, a lightweight variant that leads to quicker inference, flexible training modules, and LoRA adapters that can be seamlessly modified to support detailed fine-tuning and workflow integration.
LTX-2 excels in generating cinematic-quality synchronized audio-visual content:
Using effective prompts in LTX-2 leads to balanced scene description, motion details, and audio context:
Prompt Structure: Begin by describing the visual scene, then mention any audio or speech elements. Example: "A busy street market with colorful stalls, people walking. A young woman talks about the lively atmosphere, her voice energetic and clear."
Detail Level: Include specific visual elements (lighting, camera angles, colors) and audio characteristics (tone, background sounds) to express your vision and for better coherence.
Frame Control: Use higher num_frames (180-400) for smoother, longer sequences; opt for lower values (60-120) as they work for quick clips or rapid iteration.
FPS Settings: Set to 24 FPS for cinematic quality, or 30 FPS for broadcast-style content; you may lower FPS to create stylized effects.
Guidance Scale: The default value of 4.0 balances creativity and prompt adherence; it can be increased to 6-8 for strict prompt adherence; decrease the value to 2-3 for more interpretive results.
Resolution Tuning: Start with 720×1280 (portrait) or 1280×720 (landscape) for faster testing. Scale to higher resolutions once your prompt is refined.
Negative Prompts: Use this option judiciously; exclude artifacts like "blurry, low quality, watermark, subtitles, still frame" to maintain professional output.
Is LTX-2 open source? Yes, LTX-2 is fully open source. This version includes complete model weights, a distilled version, training code, and LoRA adapters for customization and commercial use.
How does LTX-2 differ from other video generation models? LTX-2 stands ahead of other models due to its unique feature of generating accurately synchronized audio and video in a single pass. At the same time, it runs efficiently on consumer GPUs and supports native 4K at high frame rates; all this is achieved when the model is fully open source.
What GPU do I need to run LTX-2? LTX-2 is optimized for modern consumer GPUs: a mid-range GPU with 12GB+ VRAM handles standard resolutions well; the distilled version further reduces hardware requirements for faster inference.
Can I start video generation from an image? LTX-2 supports image-to-video workflows - start by providing an initial image URL and then describe the desired motion in your prompt to animate static scenes, thus creating a dynamic visual.
What parameters should I adjust for best results?
To achieve the best results, focus on num_frames for sequence length, guidance_scale for prompt adherence, and fps for motion smoothness. Additionally, start with defaults (180 frames, 24 FPS, guidance 4.0) and iterate based on your creative goals.
How long can my output video be?
LTX-2 generates up to 20 seconds of video in a single pass. You can easily adjust num_frames and fps to control duration; for 24 FPS, 180 frames yield approximately 7.5 seconds; 400 frames create nearly 16.7 seconds long clips.