POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/qwen-image";

const data = {
  "prompt": "A mystical dragon hovering above a sparkling waterfall under a starry sky.",
  "negative_prompt": "blurry, cartoonish",
  "steps": 30,
  "seed": -1,
  "guidance": 3.5,
  "aspect_ratio": "16:9",
  "image_format": "png",
  "quality": 90,
  "base64": false
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

promptstr *

Describe imaginative landscapes or detailed environments for image generation. Examples: 'A mystical dragon cave' or 'rustic mountain cabin'.

negative_promptstr ( default: blurry, cartoonish )

Exclude elements to control results. Useful filters: 'blur' for clarity, 'cartoonish' for realism.

stepsint *

Number of steps for generating the image

min : 1,

max : 50

seedint ( default: -1 )

Determines image variation. Use -1 for unique images, fixed for repeatability.

guidancefloat ( default: 3.5 )

Adjusts adherence to prompt. Set 2.5 for creativity, 5.0 for precision.

min : 1,

max : 20

aspect_ratioenum:str ( default: 16:9 )

Defines image proportion. Use 16:9 for cinematic, 1:1 for square formats.

Allowed values:

image_formatenum:str ( default: png )

Select image format. Use 'jpeg' for small files, 'png' for quality.

Allowed values:

qualityint ( default: 90 )

Set clarity level. Choose 80 for web, 100 for print quality.

min : 10,

max : 100

base64bool ( default: 1 )

Outputs base64 image string. Enable for embedding, disable for files.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Qwen-Image: A GPT-Image equivalent open model

Last Updated: 11 Aug 2025

Qwen-Image was developed and trained by Alibaba, as a part of the broader Qwen series of large language models. The model was officially released in early August 2025. Qwen series of models primarily were LLMs that could input text and image models and output text. With Qwen-Image, the model can now generate images as well. This new model is great at complex text rendering. The Qwen image to image version (comming soon) is great at precise image editing tasks.

Technical Overview

Qwen-Image is a **20-billion parameter model using Multimodal Diffusion Transformer (MMDiT) architecture. The architecture consists of three key components working in tandem

Multimodal Large Language Model (MLLM): Uses Qwen2.5-VL (7B parameters) for extracting semantic features from text prompts
Variational AutoEncoder (VAE): Features a single-encoder, dual-decoder design optimized for text-rich image reconstruction
MMDiT Core: The 20B-parameter heart that jointly models text and image latents using flow matching with Ordinary Differential Equations

This model leverages a comprehensive data pipeline, progressive training strategies, and enhanced multi-task learning to achieve state-of-the-art results across multiple benchmarks.

Key Innovations

Advanced Text Rendering: It is able to generate both English and Chinese alphabets and render them on the image. It supports simple words to multi-line rendering and even paragraph level rendering. Prior to this model, only GPT-Image from OpenAI was capable of text rendering with this precision.

Bottom Line

Qwen-Image combines powerful general image generation capability with unmatched text rendering precision in English and Chinese. The prompt adherence is also state-of-the-art. As of today, this model is the leading open-source multimodal foundation model bridging artistic flexibility, textual accuracy, and robust editing capabilities.

Popular Models

IDM VTON Best-in-class clothing virtual try on in the wild

illusion-diffusion-hq Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1

Stable Diffusion XL 1.0 The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software

Codeformer CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.