Click or Drag-n-Drop
PNG, JPG or GIF, Up-to 2048 x 2048 px
Wan2.1 is a cutting-edge video foundation model that excels in image-to-video generation 480p and outperforms existing open-source and state-of-the-art commercial solutions
SOTA Performance: Consistently outperforms existing open-source and commercial models across multiple benchmarks.
Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information.
Architecture: Designed on the mainstream diffusion transformer paradigm with innovations like a novel spatio-temporal variational autoencoder (VAE).
Data: Trained on a vast amount of curated and deduplicated image and video data, processed through a four-step data cleaning process.
The models are licensed under the Apache 2.0 License, granting freedom of use while ensuring compliance with the license provisions.
The model is capable of generating videos at 720P resolution; however, the results are generally less stable compared to 480P due to limited training at that resolution. It is recommended to use 480P resolution for optimal performance.
Extensive manual evaluations confirm that Wan2.1 outperforms both closed-source and open-source models