POST
javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 const axios = require('axios'); const FormData = require('form-data'); const api_key = "YOUR API-KEY"; const url = "https://api.segmind.com/v1/wan-2.6-t2v"; const reqBody = { "size": "1280*720", "prompt": "Humorous but premium mini-trailer: a rugged caveman explorer sparks \"evolution\" by grunting simple commands that instantly upgrade his world and morph his own form through the ages. Extreme photoreal 4K, cinematic lighting, subtle film grain, smooth camera. No subtitles, no UI, no watermark.\nShot 1 [0-3s] Macro close-up on the caveman's weathered face and hairy knuckles clutching a jagged stone axe in a misty prehistoric dawn. He grunts deeply: \"Better!\"\nShot 2 [3-6s] Hard cut: Bustling ancient forge at golden hour, sparks flying. The caveman, now in leather tunic with a bronze sword, hammers metal confidently. Camera dollies in as he bellows: \"Faster!\"\nShot 3 [6-10s] Hard cut: Steampunk factory amid rainy industrial night, gears whirring. He transforms into a suited inventor with goggles, cranking a massive machine that hums to life. Slow zoom on his evolving eyes as he commands: \"Smarter!\"\nShot 4 [10-15s] Hard cut: Futuristic AI lab bathed in neon glow, holographic interfaces pulsing. The caveman, sleek in neural-linked exosuit, interfaces with a glowing orb; his form subtly digitizes. He smiles knowingly: \"Evolved. What's next?\"", "duration": 5, "multi_shots": false, "negative_prompt": "low resolution, error, worst quality, low quality, defects", "enable_prompt_expansion": true }; (async function() { try { const formData = new FormData(); // Append regular fields for (const key in reqBody) { if (reqBody.hasOwnProperty(key)) { formData.append(key, reqBody[key]); } } // Convert and append images as Base64 if necessary const response = await axios.post(url, formData, { headers: { 'x-api-key': api_key, ...formData.getHeaders() } }); console.log(response.data); } catch (error) { console.error('Error:', error.response ? error.response.data : error.message); } })();
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


seedint ( default: 1 )

Random seed for reproducible generation


sizeenum:str ( default: 1280*720 )

An enumeration.

Allowed values:


audiostr ( default: 1 )

Audio file (wav/mp3, 3-30s, ≤15MB) for voice/music synchronization


promptstr *

Text prompt for video generation


durationenum:int ( default: 5 )

An enumeration.

Allowed values:


multi_shotsbool ( default: 1 )

Enable intelligent multi-shot segmentation (only active when enable_prompt_expansion is enabled). True enables multi-shot segmentation, false generates single-shot content.


negative_promptstr ( default: 1 )

Negative prompt to avoid certain elements


enable_prompt_expansionbool ( default: true )

If set to true, the prompt optimizer will be enabled

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Alibaba Wan 2.6: Text-to-Video Generation Model

Edited by Segmind Team on December 18, 2025.


What is Alibaba Wan 2.6?

Alibaba Wan 2.6 is a high-performance AI model with text-to-video capabilities that convert written prompts, images, audio, or reference clips into cinematic 1080p videos. Wan 2.6 can create multi-shot sequences up to 15 seconds long with precise character consistency and smooth narrative flow, all this with no traditional filming. The model perfectly syncs audio with visuals and delivers realistic studio-quality videos, making it ideal for marketers, educators, product teams, and social media creators who need high-quality videos in a fast-paced environment.

What sets Wan 2.6 apart from basic video creation models is its ability to maintain continuity even through multiple scenes, propelling complex storytelling with dynamic camera work and seamless transitions.

Key Features of Alibaba Wan 2.6

  • High-resolution output: The model supports 1080p video generation with options for landscape (1920×1080) and portrait (1080×1920) formats.
  • Multi-shot composition: It can automatically segment complex scenes into coherent sequences with fluid shot transitions.
  • Audio synchronization: It accepts WAV or MP3 files (3 to 30 seconds) for voice-over or music that is realistic and perfectly lip-sync.
  • Flexible duration control: It generates videos from 5 to 15 seconds to match the platform and narrative pacing.
  • Prompt expansion: It is integrated with optional AI-powered prompt refinement that intelligently interprets creative intent and then adds visual detail automatically.
  • Reproducible outputs: Its seed-based generation ensures consistent results across iterations.

Best Use Cases

  • Social media content: Creators can use it to create engaging TikTok, Instagram Reels, and YouTube Shorts without filming equipment.
  • Marketing and advertising: They can generate product demos, explainer videos, and campaign assets with brand-consistent visuals.
  • Educational content: It is an asset that can transform lesson scripts into visual narratives with synchronized voiceovers.
  • Prototyping and storyboarding: It is excellent to rapidly visualize concepts for client presentations or internal reviews.
  • E-commerce: Teams can utilize it to produce dynamic showcase videos to highlight products' features and benefits.

Prompt Tips and Output Quality

Writing effective prompts:

  • Specify visual style, lighting, and camera angles (e.g., "cinematic close-up with warm backlighting").
  • Describe character actions and scene transitions clearly in detail for multi-shot sequences.
  • Reference art styles or technical specifications like "4K visuals" or "shallow depth of field."

Parameter recommendations:

  • Keep enable_prompt_expansion on for richer visual detail and better scene interpretation.

  • Use multi_shots for narrative content that requires multiple perspectives or scene changes.

  • Set duration to 15 seconds when telling complex stories; use 5 seconds for simple product shots.

  • Choose 1920×1080 for YouTube and presentations; 1080×1920 for mobile-first platforms.

  • Leverage negative_prompt to exclude unwanted elements like "blurry, low quality, distorted faces."

  • For reproducible results, lock the seed value.

  • When iterating, adjust only one parameter at a time to understand its impact.

FAQs

Is Alibaba Wan 2.6 open-source?
No, Wan 2.6 is a proprietary model developed by Alibaba. It is accessible through API integrations and platforms like Segmind.

How does Wan 2.6 compare to other text-to-video models?
Wan 2.6 stands out with its gamut of capabilities that include: native audio synchronization, realistic lip-sync capabilities, and superior multi-shot composition compared to single-scene generators. Additionally, its character continuity across shots rivals models used in professional production pipelines.

What audio formats does it support?
The model accepts WAV and MP3 files between 3 and 30 seconds. It supports frame-by-frame audio synchronization for accurate lip movement and timing.

Can I control the video aspect ratio?
Yes, you can choose from four presets: 1280×720, 720×1280 (vertical), 1920×1080 (landscape), or 1080×1920 (portrait) based on the specific platform.

What parameters should I tweak for the best results?

  • Start with a good prompt, i.e, it should be detailed with action-oriented descriptions for the best result.
  • You can enable prompt expansion for automatic enhancement.
  • Use multi_shots for complex narratives and adjust duration based on content density.
  • Fine-tune negative prompts to eliminate specific visual artifacts.

Does the seed parameter affect video content?
Yes, using the same seed with identical parameters produces consistent outputs; it is essential for A/B testing prompts or to maintain brand consistency across video variants.