POST
javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 const axios = require('axios'); const fs = require('fs'); const path = require('path'); // helper function to help you convert your local images into base64 format async function toB64(imgPath) { const data = fs.readFileSync(path.resolve(imgPath)); return Buffer.from(data).toString('base64'); } const api_key = "YOUR API-KEY"; const url = "https://api.segmind.com/v1/qwen3.5-flash"; const data = { "messages": [ { "role": "user", "content" : "tell me a joke on cats" }, { "role": "assistant", "content" : "here is a joke about cats..." }, { "role": "user", "content" : "now a joke on dogs" }, ] }; (async function() { try { const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } }); console.log(response.data); } catch (error) { console.error('Error:', error.response.data); } })();
RESPONSE
application/json
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


messagesArray

An array of objects containing the role and content


rolestr

Could be "user", "assistant" or "system".


contentstr

A string containing the user's query or the assistant's response.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Qwen 3.5 Flash — Fast Multimodal Language Model

What is Qwen 3.5 Flash?

Qwen 3.5 Flash is Alibaba Cloud's speed-optimized multimodal language model, built for developers and teams who need fast, reliable AI inference at scale without sacrificing quality. Part of the Qwen3.5 series, it supports text, image, and video inputs within a single API call, backed by a 1 million token context window. Thinking mode is enabled by default, giving the model enhanced reasoning on ambiguous or complex queries before returning a final answer.

Where Qwen3.5-Plus and Qwen3.5-Max target deep reasoning and maximum capability, Flash is built for throughput — the right tool when you need high-volume processing at low cost. With up to 65,536 output tokens and 81,920 chain-of-thought tokens, it comfortably handles long-form generation, document digestion, and structured output tasks without the overhead of a larger model.

Key Features

  • 1M token context window — process entire books, codebases, or hour-long transcripts in one pass
  • Native multimodal inputs — text, images, and video frames all supported in a single request
  • Thinking mode by default — lightweight chain-of-thought reasoning improves accuracy on complex prompts
  • 201 language support — broad multilingual coverage with a 250k vocabulary for efficient encoding
  • Fast and cost-efficient — optimized for high-volume production workloads
  • Max output: 65,536 tokens — suitable for long documents, reports, and extensive code generation

Best Use Cases

Document and visual Q&A: Attach an image of a contract, diagram, or receipt alongside a prompt — Qwen 3.5 Flash accurately interprets and answers questions about visual content without a separate vision model.

High-volume summarization and classification: The model's speed and low cost make it ideal for pipelines that process thousands of documents, emails, or customer messages per day.

Video frame understanding: Submit individual video frames with context to extract scene descriptions, detect objects, or generate captions at scale.

Multilingual applications: With 201 language and dialect coverage, it is well-suited for global customer support bots, translation assistants, and cross-lingual document processing.

Coding assistance and reasoning: Despite being the Flash tier, the model handles coding tasks, debugging prompts, and structured reasoning well — especially when paired with thinking mode.

Agentic workflows: Low latency and multimodal support make Qwen 3.5 Flash a strong backbone for agent pipelines that need to process mixed text-and-image tool outputs.

Prompt Tips and Output Quality

For best results with Qwen 3.5 Flash:

  • Be specific about output format. If you want JSON, markdown tables, or bullet lists, say so explicitly in the prompt.
  • Use the image parameter for visual tasks. Always include a descriptive text prompt alongside your image.
  • Leverage the long context window. You can pass entire documents, transcripts, or conversation histories in the prompt.
  • Thinking mode is on by default. For complex reasoning tasks, it notably improves accuracy.
  • Set clear role context. Starting your prompt with a system-style instruction improves tone, accuracy, and consistency.

FAQs

Does Qwen 3.5 Flash support video input? Yes. It accepts video frames as image inputs alongside text prompts, enabling scene description, object detection, and video captioning at scale.

How large is the context window? Qwen 3.5 Flash supports a 1 million token context window, making it one of the largest available among fast-tier models.

What is thinking mode? Thinking mode enables the model to perform lightweight chain-of-thought reasoning before producing its final response. It is enabled by default and improves accuracy on multi-step or ambiguous tasks.

How does Qwen 3.5 Flash compare to Qwen 3.5 Plus or Max? Flash is optimized for speed and cost efficiency. Plus and Max offer deeper reasoning and higher capability for complex tasks. Flash is the right choice for production workloads where volume matters.

Can I use it for code generation? Yes. Qwen 3.5 Flash handles coding tasks, debugging, and code explanation well.

Is it multilingual? Yes. The model covers 201 languages and dialects, making it suitable for global applications including customer support, localization, and translation.