Input: $0.5, Output: $3

Cost per million tokens

For enterprise pricing and custom weights or models

Qwen 3.5 Plus — Native Multimodal AI Model

What is Qwen 3.5 Plus?

Qwen 3.5 Plus is Alibaba Cloud's hosted flagship multimodal AI model, released February 16, 2026. It is the production-ready version of Qwen3.5-397B-A17B — a Mixture-of-Experts model with 397 billion total parameters and just 17 billion active per forward pass, enabling exceptional inference efficiency without sacrificing quality.

Unlike traditional AI models that treat text and vision as separate pipelines, Qwen 3.5 Plus was trained natively on text, images, and video from the ground up using early text-vision fusion architecture. This means it genuinely understands all three modalities together — not as bolt-on features.

With a 1 million token context window (one of the largest available), it is built for developers and enterprises who need to handle long documents, complex reasoning chains, and multi-turn agentic workflows in a single API call.

Key Features

Native multimodal understanding: Processes text, images (up to 1344x1344), and 60-second video clips natively — no separate vision model required.
1M token context window: Handle entire codebases, legal documents, research papers, or long conversation histories in a single request.
Three operational modes: Auto (adaptive thinking + search + code interpreter), Thinking (deep step-by-step reasoning), and Fast (instant responses for low-latency tasks).
Built-in adaptive tool use: Native access to web search and a code interpreter in Auto mode — no external orchestration needed.
201 languages and dialects: Expanded from 119, with a 250k vocabulary improving encoding/decoding efficiency by 10-60% across most languages.
Cost-efficient MoE architecture: ~60% lower running costs vs predecessor models, with 8.6x-19x faster decoding throughput.

Best Use Cases

Qwen 3.5 Plus is ideal for:

Document intelligence: OCR, contract review, research paper summarization, and long-document Q&A with 1M context.
Visual reasoning: Spatial analysis, chart interpretation, infographic understanding, and medical imaging tasks.
Agentic workflows: Multi-step tool use, web browsing, code execution, and function-calling pipelines with built-in tools.
Coding assistance: Complex debugging, code generation, and multi-file repository analysis.
Retrieval-Augmented Generation (RAG): Ingest and reason over large knowledge bases in a single context window.
Multilingual AI applications: Customer support, translation, and localization across 201 languages.

Prompt Tips and Output Quality

To get the best results from Qwen 3.5 Plus:

Use the right mode: Set enable_thinking: true for complex multi-step problems. Use Fast mode for simple queries where speed matters.
Be specific with multimodal inputs: When sending images, describe what aspect you want analyzed. Qwen 3.5 Plus can handle multiple images in one request.
Leverage the full context window: For RAG use cases, include complete documents rather than chunking — the 1M context means less retrieval overhead and better coherence.
Enable search for factual tasks: In Auto mode, the built-in search tool ensures factually grounded responses for time-sensitive queries.
Structure prompts for agents: Use clear task decomposition in your system prompt to maximize the model's native agentic capabilities.

Output quality is consistently high across text, code, and visual tasks. Benchmark scores include 83.6 on LiveCodeBench v6, 91.3 on AIME26, and 88.4 on GPQA Diamond — outperforming GPT-5.2 and Claude Opus 4.5 on over 80% of evaluated categories.

FAQs

What is the difference between Qwen 3.5 Plus and Qwen3.5-397B-A17B? Qwen3.5-397B-A17B is the open-weight model you can self-host. Qwen 3.5 Plus is the hosted production version with additional features: 1M context (vs 256K base), built-in tools, and three operational modes.

Does Qwen 3.5 Plus support video input? Yes. It natively processes video clips up to 60 seconds. Pass video as a URL or base64-encoded data alongside your prompt.

What operational modes are available? Auto (adaptive thinking + web search + code interpreter), Thinking (deep step-by-step reasoning), and Fast (instant, low-latency responses).

How does pricing compare to other models? Input tokens are priced at $0.50 per 1M tokens and output at $3.00 per 1M tokens — competitive with mid-tier frontier models while offering significantly better multimodal capabilities.

Can Qwen 3.5 Plus use external tools natively? Yes. In Auto mode, built-in tools include web search and a code interpreter — no external orchestration or LangChain required.

Is Qwen 3.5 Plus suitable for production enterprise workloads? Absolutely. The MoE architecture delivers 8.6x-19x faster throughput than predecessor dense models, and the 1M context window reduces RAG complexity significantly.

Popular Models

SDXL Controlnet SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process

Story Diffusion Story Diffusion turns your written narratives into stunning image sequences.

Fooocus Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney.

Faceswap V2 Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training