POST

javascript

const axios = require('axios');
const FormData = require('form-data');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/wan-2.6-i2v";

const reqBody = {
  "image": "https://segmind-resources.s3.amazonaws.com/output/58c03b83-811d-4c4d-8837-1c01b8c8cdea-wan2.6-i2v-ip.webp",
  "prompt": "A dramatic action POV chase scene. The camera shows the protagonist falling back on the wet ground, then quickly standing up and sprinting at high speed through heavy rain. The POV camera shakes violently with each step, with motion blur on the trees and bushes rushing past. Every few seconds, the character quickly turns their head, and the camera swings backward to reveal a massive T-Rex charging straight toward them. Mud explodes under its heavy footsteps, rain drips from its jaws, and it snaps its enormous teeth just behind the camera. Raindrops hit the lens, dirt and leaves splash upward, and flashes of lightning illuminate the dinosaur’s wet scales. The scene feels frantic, high-speed, hyper-realistic, and intensely cinematic",
  "duration": 5,
  "resolution": "720p",
  "multi_shots": false,
  "negative_prompt": "low resolution, error, worst quality, low quality, defects",
  "enable_prompt_expansion": true
};

(async function() {
    try {
        const formData = new FormData();
        
        // Append regular fields
        for (const key in reqBody) {
            if (reqBody.hasOwnProperty(key)) {
                formData.append(key, reqBody[key]);
            }
        }

        // Convert and append images as Base64 if necessary
        
        
        const response = await axios.post(url, formData, {
            headers: {
                'x-api-key': api_key,
                ...formData.getHeaders()
            }
        });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response ? error.response.data : error.message);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

seedint ( default: 1 )

Random seed for reproducible generation

audiostr ( default: 1 )

Audio file (wav/mp3, 3-30s, ≤15MB) for voice/music synchronization

imagestr *

Input image for video generation

promptstr *

Text prompt for video generation

durationenum:int ( default: 5 )

An enumeration.

Allowed values:

resolutionenum:str ( default: 720p )

An enumeration.

Allowed values:

multi_shotsbool ( default: 1 )

Enable intelligent multi-shot segmentation (only active when enable_prompt_expansion is enabled). True enables multi-shot segmentation, false generates single-shot content.

negative_promptstr ( default: 1 )

Negative prompt to avoid certain elements

enable_prompt_expansionbool ( default: true )

If set to true, the prompt optimizer will be enabled

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Wan 2.6: AI Video Generation Model

Edited by Segmind Team on December 18, 2025.

What is Wan 2.6?

Wan 2.6 is Alibaba’s cutting-edge model with AI video generation capabilities designed to convert text prompts, still images, and reference clips into cinematic 1080p videos. It eliminates the need for filming or manual editing, as it delivers 5 to 15-second clips at 24 frames per second with impressive consistency and clarity. Wan 2.6 reigns supreme over other models with its powerful ability to maintain character continuity throughout multi-shot sequences and precisely syncs audio with realistic lip movements, making it a valuable asset for creators, marketers, and developers. The model stands out for its ability to deliver impactful marketing campaigns, educational content, and product demonstrations, with polished, professional-quality videos quickly and with ease.

Key Features of Wan 2.6

Multiple Generation Modes: It supports text-to-video, image-to-video, and reference-to-video workflows.
High-Resolution Output: It generates professional-quality videos in 720p or 1080p at 24fps.
Multi-Shot Storytelling: It creates dynamic videos consisting of multiple shots while preserving narrative flow and character consistency.
Precise Audio Sync: It perfectly aligns audio tracks with lip movements to create videos with realistic dialogue and narration.
Flexible Duration Control: It produces videos of 5 to 15 seconds that are also perfect for different content formats.
Prompt Expansion: It automatically understands context to enrich prompts and create detailed and creative outputs.
Reproducibility: Seed control ensures consistent results across repeated runs.

Best Use Cases

Social Media Content: It is a powerful tool to generate captivating reels, TikToks, and Instagram stories with lip-synced dialogue or music.
Marketing and Advertising: Teams can produce quick product demos, explainer videos, and promotional content without expensive video shoots.
Educational Content: It is perfect to create engaging instructional videos with narration, to help learners by simplifying complex topics
Product Demonstrations: It can be utilized to showcase features and use cases in a polished, cinematic video format.
Storytelling and Entertainment: It is perfect to conjure multi-shot narrative sequences with consistent characters and seamless transitions.

Prompt Tips and Output Quality

Write Effective Prompts: While providing the prompts to the model, be vivid and use cinematic language: instead of "a person walking," try "a young woman in a red coat walking through a foggy autumn forest at dawn, golden light filtering through bare trees." The model responds well to detailed descriptions of setting, lighting, mood, and action.
Image Quality Matters: For the input, use detailed, high-resolution images when using image-to-video mode. Also, using expressive faces and clear compositions will yield better results.
Use Negative Prompts: If there are unwanted elements, make it a point to explicitly exclude "blur, static, color washout, incorrect perspective" to improve output consistency.
Leverage Prompt Expansion: Keep enable_prompt_expansion set to true for higher creativity in scenes with rich output; disable it for strict adherence to a simple prompt.
Duration and Resolution Trade-offs: Use 5-second videos for quick demos and social posts; go for 15 seconds when you need holistic storytelling. Choose 1080p for final deliverables, 720p for faster iteration.
Multi-Shot Storytelling: Enable multi_shots: true for dynamic, varied scenes; disable it for single-focus, simpler compositions.
Audio Sync: Link an external audio file (music or narration) for lip-synced videos. Also, ensure the audio length matches your desired video duration for best results.

FAQs

Is Wan 2.6 open-source?
Wan 2.6 is Alibaba's proprietary model, accessible via API. It is not open-source, but you can integrate it into your applications through Segmind's platform.

How is Wan 2.6 different from other AI video models?
Wan 2.6 offers precise audio-to-lip sync capabilities and multi-shot storytelling features. When compared to other models that produce single-scene clips, Wan 2.6 maintains character consistency across multiple shots, ideal for narrative-driven content.

What parameters should I tweak for the best results?

Start with a detailed prompt and high-quality image.
Use resolution: 1080p and duration: 10 or 15 for polished outputs.
Enable multi_shots for dynamic storytelling.
Adjust seed for reproducibility and experiment with negative_prompt to exclude unwanted artifacts.

Can I use Wan 2.6 for commercial projects?
Yes, videos generated with Wan 2.6 can be used commercially. Do check Segmind's terms of service for specific licensing details.

What's the maximum video length?
Wan 2.6 supports videos up to 15 seconds; for longer content, generate multiple clips and put them together in post-production.

Does Wan 2.6 support audio generation?
The model is capable of perfectly syncing external audio files with video. You must provide your own audio (narration, music, or dialogue) via the audio parameter; the model does not generate audio from scratch without any audio input.

Popular Models

SDXL Img2Img SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

SDXL Controlnet SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process

SadTalker Audio-based Lip Synchronization for Talking Head Video

SDXL Inpaint This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask