POST

javascript

const axios = require('axios');
const FormData = require('form-data');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/wan-2.6-t2v";

const reqBody = {
  "size": "1280*720",
  "prompt": "Humorous but premium mini-trailer: a rugged caveman explorer sparks \"evolution\" by grunting simple commands that instantly upgrade his world and morph his own form through the ages. Extreme photoreal 4K, cinematic lighting, subtle film grain, smooth camera. No subtitles, no UI, no watermark.\nShot 1 [0-3s] Macro close-up on the caveman's weathered face and hairy knuckles clutching a jagged stone axe in a misty prehistoric dawn. He grunts deeply: \"Better!\"\nShot 2 [3-6s] Hard cut: Bustling ancient forge at golden hour, sparks flying. The caveman, now in leather tunic with a bronze sword, hammers metal confidently. Camera dollies in as he bellows: \"Faster!\"\nShot 3 [6-10s] Hard cut: Steampunk factory amid rainy industrial night, gears whirring. He transforms into a suited inventor with goggles, cranking a massive machine that hums to life. Slow zoom on his evolving eyes as he commands: \"Smarter!\"\nShot 4 [10-15s] Hard cut: Futuristic AI lab bathed in neon glow, holographic interfaces pulsing. The caveman, sleek in neural-linked exosuit, interfaces with a glowing orb; his form subtly digitizes. He smiles knowingly: \"Evolved. What's next?\"",
  "duration": 5,
  "multi_shots": false,
  "negative_prompt": "low resolution, error, worst quality, low quality, defects",
  "enable_prompt_expansion": true
};

(async function() {
    try {
        const formData = new FormData();
        
        // Append regular fields
        for (const key in reqBody) {
            if (reqBody.hasOwnProperty(key)) {
                formData.append(key, reqBody[key]);
            }
        }

        // Convert and append images as Base64 if necessary
        
        
        const response = await axios.post(url, formData, {
            headers: {
                'x-api-key': api_key,
                ...formData.getHeaders()
            }
        });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response ? error.response.data : error.message);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

seedint ( default: 1 )

Random seed for reproducible generation

sizeenum:str ( default: 1280*720 )

An enumeration.

Allowed values:

audiostr ( default: 1 )

Audio file (wav/mp3, 3-30s, ≤15MB) for voice/music synchronization

promptstr *

Text prompt for video generation

durationenum:int ( default: 5 )

An enumeration.

Allowed values:

multi_shotsbool ( default: 1 )

Enable intelligent multi-shot segmentation (only active when enable_prompt_expansion is enabled). True enables multi-shot segmentation, false generates single-shot content.

negative_promptstr ( default: 1 )

Negative prompt to avoid certain elements

enable_prompt_expansionbool ( default: true )

If set to true, the prompt optimizer will be enabled

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Alibaba Wan 2.6: Text-to-Video Generation Model

Edited by Segmind Team on December 18, 2025.

What is Alibaba Wan 2.6?

Alibaba Wan 2.6 is a high-performance AI model with text-to-video capabilities that convert written prompts, images, audio, or reference clips into cinematic 1080p videos. Wan 2.6 can create multi-shot sequences up to 15 seconds long with precise character consistency and smooth narrative flow, all this with no traditional filming. The model perfectly syncs audio with visuals and delivers realistic studio-quality videos, making it ideal for marketers, educators, product teams, and social media creators who need high-quality videos in a fast-paced environment.

What sets Wan 2.6 apart from basic video creation models is its ability to maintain continuity even through multiple scenes, propelling complex storytelling with dynamic camera work and seamless transitions.

Key Features of Alibaba Wan 2.6

High-resolution output: The model supports 1080p video generation with options for landscape (1920×1080) and portrait (1080×1920) formats.
Multi-shot composition: It can automatically segment complex scenes into coherent sequences with fluid shot transitions.
Audio synchronization: It accepts WAV or MP3 files (3 to 30 seconds) for voice-over or music that is realistic and perfectly lip-sync.
Flexible duration control: It generates videos from 5 to 15 seconds to match the platform and narrative pacing.
Prompt expansion: It is integrated with optional AI-powered prompt refinement that intelligently interprets creative intent and then adds visual detail automatically.
Reproducible outputs: Its seed-based generation ensures consistent results across iterations.

Best Use Cases

Social media content: Creators can use it to create engaging TikTok, Instagram Reels, and YouTube Shorts without filming equipment.
Marketing and advertising: They can generate product demos, explainer videos, and campaign assets with brand-consistent visuals.
Educational content: It is an asset that can transform lesson scripts into visual narratives with synchronized voiceovers.
Prototyping and storyboarding: It is excellent to rapidly visualize concepts for client presentations or internal reviews.
E-commerce: Teams can utilize it to produce dynamic showcase videos to highlight products' features and benefits.

Prompt Tips and Output Quality

Writing effective prompts:

Specify visual style, lighting, and camera angles (e.g., "cinematic close-up with warm backlighting").
Describe character actions and scene transitions clearly in detail for multi-shot sequences.
Reference art styles or technical specifications like "4K visuals" or "shallow depth of field."

Parameter recommendations:

Keep enable_prompt_expansion on for richer visual detail and better scene interpretation.
Use multi_shots for narrative content that requires multiple perspectives or scene changes.
Set duration to 15 seconds when telling complex stories; use 5 seconds for simple product shots.
Choose 1920×1080 for YouTube and presentations; 1080×1920 for mobile-first platforms.
Leverage negative_prompt to exclude unwanted elements like "blurry, low quality, distorted faces."
For reproducible results, lock the seed value.
When iterating, adjust only one parameter at a time to understand its impact.

FAQs

Is Alibaba Wan 2.6 open-source?
No, Wan 2.6 is a proprietary model developed by Alibaba. It is accessible through API integrations and platforms like Segmind.

How does Wan 2.6 compare to other text-to-video models?
Wan 2.6 stands out with its gamut of capabilities that include: native audio synchronization, realistic lip-sync capabilities, and superior multi-shot composition compared to single-scene generators. Additionally, its character continuity across shots rivals models used in professional production pipelines.

What audio formats does it support?
The model accepts WAV and MP3 files between 3 and 30 seconds. It supports frame-by-frame audio synchronization for accurate lip movement and timing.

Can I control the video aspect ratio?
Yes, you can choose from four presets: 1280×720, 720×1280 (vertical), 1920×1080 (landscape), or 1080×1920 (portrait) based on the specific platform.

What parameters should I tweak for the best results?

Start with a good prompt, i.e, it should be detailed with action-oriented descriptions for the best result.
You can enable prompt expansion for automatic enhancement.
Use multi_shots for complex narratives and adjust duration based on content density.
Fine-tune negative prompts to eliminate specific visual artifacts.

Does the seed parameter affect video content?
Yes, using the same seed with identical parameters produces consistent outputs; it is essential for A/B testing prompts or to maintain brand consistency across video variants.

Popular Models

SDXL Img2Img SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

SDXL Controlnet SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process

Faceswap V2 Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

Codeformer CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.