1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
const axios = require('axios');
const FormData = require('form-data');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/wan-2.6-i2v";
const reqBody = {
"image": "https://segmind-resources.s3.amazonaws.com/output/58c03b83-811d-4c4d-8837-1c01b8c8cdea-wan2.6-i2v-ip.webp",
"prompt": "A dramatic action POV chase scene. The camera shows the protagonist falling back on the wet ground, then quickly standing up and sprinting at high speed through heavy rain. The POV camera shakes violently with each step, with motion blur on the trees and bushes rushing past. Every few seconds, the character quickly turns their head, and the camera swings backward to reveal a massive T-Rex charging straight toward them. Mud explodes under its heavy footsteps, rain drips from its jaws, and it snaps its enormous teeth just behind the camera. Raindrops hit the lens, dirt and leaves splash upward, and flashes of lightning illuminate the dinosaur’s wet scales. The scene feels frantic, high-speed, hyper-realistic, and intensely cinematic",
"duration": 5,
"resolution": "720p",
"multi_shots": false,
"negative_prompt": "low resolution, error, worst quality, low quality, defects",
"enable_prompt_expansion": true
};
(async function() {
try {
const formData = new FormData();
// Append regular fields
for (const key in reqBody) {
if (reqBody.hasOwnProperty(key)) {
formData.append(key, reqBody[key]);
}
}
// Convert and append images as Base64 if necessary
const response = await axios.post(url, formData, {
headers: {
'x-api-key': api_key,
...formData.getHeaders()
}
});
console.log(response.data);
} catch (error) {
console.error('Error:', error.response ? error.response.data : error.message);
}
})();Random seed for reproducible generation
Audio file (wav/mp3, 3-30s, ≤15MB) for voice/music synchronization
Input image for video generation
Text prompt for video generation
An enumeration.
Allowed values:
An enumeration.
Allowed values:
Enable intelligent multi-shot segmentation (only active when enable_prompt_expansion is enabled). True enables multi-shot segmentation, false generates single-shot content.
Negative prompt to avoid certain elements
If set to true, the prompt optimizer will be enabled
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Edited by Segmind Team on December 18, 2025.
Wan 2.6 is Alibaba’s cutting-edge model with AI video generation capabilities designed to convert text prompts, still images, and reference clips into cinematic 1080p videos. It eliminates the need for filming or manual editing, as it delivers 5 to 15-second clips at 24 frames per second with impressive consistency and clarity. Wan 2.6 reigns supreme over other models with its powerful ability to maintain character continuity throughout multi-shot sequences and precisely syncs audio with realistic lip movements, making it a valuable asset for creators, marketers, and developers. The model stands out for its ability to deliver impactful marketing campaigns, educational content, and product demonstrations, with polished, professional-quality videos quickly and with ease.
Write Effective Prompts: While providing the prompts to the model, be vivid and use cinematic language: instead of "a person walking," try "a young woman in a red coat walking through a foggy autumn forest at dawn, golden light filtering through bare trees." The model responds well to detailed descriptions of setting, lighting, mood, and action.
Image Quality Matters: For the input, use detailed, high-resolution images when using image-to-video mode. Also, using expressive faces and clear compositions will yield better results.
Use Negative Prompts: If there are unwanted elements, make it a point to explicitly exclude "blur, static, color washout, incorrect perspective" to improve output consistency.
Leverage Prompt Expansion: Keep enable_prompt_expansion set to true for higher creativity in scenes with rich output; disable it for strict adherence to a simple prompt.
Duration and Resolution Trade-offs: Use 5-second videos for quick demos and social posts; go for 15 seconds when you need holistic storytelling. Choose 1080p for final deliverables, 720p for faster iteration.
Multi-Shot Storytelling: Enable multi_shots: true for dynamic, varied scenes; disable it for single-focus, simpler compositions.
Audio Sync: Link an external audio file (music or narration) for lip-synced videos. Also, ensure the audio length matches your desired video duration for best results.
Is Wan 2.6 open-source?
Wan 2.6 is Alibaba's proprietary model, accessible via API. It is not open-source, but you can integrate it into your applications through Segmind's platform.
How is Wan 2.6 different from other AI video models?
Wan 2.6 offers precise audio-to-lip sync capabilities and multi-shot storytelling features. When compared to other models that produce single-scene clips, Wan 2.6 maintains character consistency across multiple shots, ideal for narrative-driven content.
What parameters should I tweak for the best results?
prompt and high-quality image.resolution: 1080p and duration: 10 or 15 for polished outputs.multi_shots for dynamic storytelling.seed for reproducibility and experiment with negative_prompt to exclude unwanted artifacts.Can I use Wan 2.6 for commercial projects?
Yes, videos generated with Wan 2.6 can be used commercially. Do check Segmind's terms of service for specific licensing details.
What's the maximum video length?
Wan 2.6 supports videos up to 15 seconds; for longer content, generate multiple clips and put them together in post-production.
Does Wan 2.6 support audio generation?
The model is capable of perfectly syncing external audio files with video. You must provide your own audio (narration, music, or dialogue) via the audio parameter; the model does not generate audio from scratch without any audio input.