POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/wan-2.2-i2v-fast";

const data = {
  "image": "https://segmind-resources.s3.amazonaws.com/output/310df0db-0c5e-4c5a-8c78-5db73fbc7c91-bcd49a2d-a0a9-465e-aefd-ca9b5ebb11a7.jpeg",
  "prompt": "4K cinematic close-up of a bloodied, battle-worn Viking warrior kneeling in a snowy sacred cave, eyes wide and glassy with intensity.His long braided blonde beard is frosted with snow, blood streaks run down his face and shoulders.The background is dimly lit by a flickering flame behind him — ancient carvings and symbols glow faintly on the icy stone wall.Camera slowly pushes in on his face as snow swirls in slow motion, and he breathes heavily, lips slightly parted.Suddenly, subtle glitch effects ripple across his face — like time distorting — as if a divine force is entering his mind.In the distance, an echo of a woman’s voice is heard whispering prophecy.As his eyes narrow, a faint blue rune glow reflects in his iris, foreshadowing something ancient and powerful.The camera holds as his expression changes — from fear… to understanding… to resolve.",
  "go_fast": true,
  "num_frames": 81,
  "resolution": "480p",
  "aspect_ratio": "16:9",
  "sample_shift": 12,
  "frames_per_second": 16
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

seedint ( default: 1 )

Sets randomness; leave blank for complete randomization. Use a specific seed for reproducibility.

imagestr *

Input image URL for video base. Choose a detailed image for best results.

promptstr *

Descriptive prompt for video content. Include vivid details for a richer video.

go_fastbool ( default: true )

Activates faster video generation. Disable for quality over speed.

num_framesint ( default: 81 )

Sets video frame count; 81 frames for optimal detail. Increase for smoother motion.

min : 81,

max : 100

resolutionenum:str ( default: 480p )

Sets video resolution. Use 720p for higher clarity, 480p for faster generation.

Allowed values:

aspect_ratioenum:str ( default: 16:9 )

Defines frame aspect ratio. Use 16:9 for landscape and 9:16 for portrait videos.

Allowed values:

sample_shiftfloat ( default: 12 )

Adjusts image sampling shift. Use lower values for subtle changes.

min : 1,

max : 20

frames_per_secondint ( default: 16 )

Determines video smoothness. Choose 16 fps for optimal balance.

min : 5,

max : 24

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Wan 2.2: A Mixture-of-Experts Model for Video Generation

Wan 2.2 is the latest and greatest in open-source AI video generation, developed by Alibaba's Tongyi Lab. This model introduces new architectural innovations that significantly advance the field of text-to-video and image-to-video generation while maintaining computational efficiency, making it affordable for many use cases. This is the A14B model that can output 480p and 720p videos. There is also a smaller 5B model that is consumer GPU friendly.

About the tech: Mixture-of-Experts (MoE) Architecture

The model leverages Mixture of Experts or MoE architecture and uses 2 expert models under the hood for the diffusion denoising process.

High-noise expert: Processes early denoising stages, focusing on overall layout and structure
Low-noise expert: Manages final stages, refining video details and reducing artifacts

This two model approach results in 14 billion active parameters per inference step and a total of 27 billion over all parameters combining both the models. The transition between the experts is decided based on the signal to noise ration (SNR) helping the pipeline intelligently transition between the two experts without sacrificing output quality

Wan 2.2 also brings substantial improvements over its predecessor through expanded training data, featuring 65% more images and 80% more videos. This enables advance motion generations helping generate complex body movements, dynamic scene transitions and fluid camera controls. It can also simulate realistic physics scenarios and object interactions. This makes it effective for character animations, sport scenes and other cimenatic sequences.

Another big leap due the added training data is tight control over lighting, composition, contrast and color tone. Wan 2.2 offers over 60 controllable parameters that enale control for camera aware prompting like "aerial orbit," "handheld tracking shot," or specific lighting requirements.

Use cases

Just like other Text to Video and Image to video models this model can be used for a range of use cases. Use to generate cinematic visuals for a project you are working on or generate short social media ads with a product in focus. You can also use the model to create simple animations that can be used as a website background or on a slide deck. The lower costs compared to a lot of open models out there makes it the first choice before trying other video generator model.

License details

Wan 2.2's open-source nature under the Apache 2.0 license makes it the best choice for a range of use cases including commercial and educational purposes.

Popular Models

SDXL Img2Img SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

Faceswap V2 Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

SDXL Inpaint This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

Faceswap Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training