POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/gpt-image-2";

const data = {
  "prompt": "A photorealistic, cinematic shot of a cozy independent bookstore in Mumbai at golden hour. Warm afternoon sunlight streams through a tall front window onto wooden shelves packed with books; book spines are clearly visible with titles in English and Hindi Devanagari script. In the foreground, a handwritten chalkboard A-frame easel reads, in clean legible chalk lettering: first line 'मुंबई पुस्तक भंडार', second line 'Mumbai Book Store', and a smaller third line 'Open Daily  9 am – 9 pm'. Shallow depth of field, shot on a full-frame camera at 35mm f/2.0, ultra-realistic detail, natural color grading, dust motes in sunlight, film grain.",
  "size": "1536x1024",
  "quality": "high",
  "moderation": "auto",
  "background": "opaque",
  "output_compression": 100,
  "output_format": "png"
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

promptstr * Affects Pricing

Text describing the image; supports in-image typography across scripts. Lead with subject, style, lighting.

sizeenum:str ( default: 1536x1024 ) Affects Pricing

Output resolution (WIDTHxHEIGHT). 'auto' lets the model pick. For a custom resolution not listed, set Width and Height instead. Constraints: each edge a multiple of 16, aspect ratio 1:3-3:1, longest edge <= 3840, total pixels 655,360-8,294,400.

Allowed values:

qualityenum:str ( default: high ) Affects Pricing

Rendering fidelity; 'high' keeps typography crisp. Use 'medium' or 'low' only for previews.

Allowed values:

moderationenum:str ( default: auto )

Content filter strictness; 'auto' is the safe default. Use 'low' only for permitted use cases.

Allowed values:

backgroundenum:str ( default: opaque )

'opaque' for full scenes; 'transparent' for logos, stickers, and product cutouts.

Allowed values:

output_compressionint ( default: 100 )

Compression level 0-100; 100 preserves text crispness. Lower values reduce file size.

output_formatenum:str ( default: png )

Use 'png' for crisp text, 'webp' for size, 'jpeg' for broad compatibility.

Allowed values:

image_urlsstr ( default: ) Affects Pricing

A list of reference images. Include one or more URLs to edit or draw context from.

mask_image_urlstr ( default: 1 )

Optional mask image URL for surgical inpainting. White regions of the mask indicate areas to edit; everything outside stays pixel-perfect.

widthint ( default: 1 ) Affects Pricing

Optional custom width in pixels. Set both Width and Height to override the Size preset. Rules: each a multiple of 16, aspect ratio between 1:3 and 3:1, longest edge <= 3840, total pixels 655,360-8,294,400.

heightint ( default: 1 ) Affects Pricing

Optional custom height in pixels. Set both Width and Height to override the Size preset. Rules: each a multiple of 16, aspect ratio between 1:3 and 3:1, longest edge <= 3840, total pixels 655,360-8,294,400.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

GPT Image 2: Photorealistic Text-to-Image and Edit Model

What is GPT Image 2?

GPT Image 2 is OpenAI's next-generation image model, launched in April 2026 as the successor to gpt-image-1.5. It generates photorealistic images from text or edits existing images guided by a prompt, all through a single endpoint. The headline improvement is near-perfect in-image typography: over 95% accuracy across Latin, Japanese, Korean, Chinese, Hindi Devanagari, and Bengali scripts — the first image model practical for shipping UI labels, posters, and multilingual marketing assets without a manual redraw pass. A new single-pass architecture roughly doubles generation speed over the previous version, and built-in reasoning plans composition, counts items, and checks constraints before rendering.

Key Features

Text-to-image generation and guided image editing in one API
In-image text rendering at 95%+ accuracy, including non-Latin scripts
Output resolutions up to 2K across landscape (1536x1024), portrait (1024x1536), and square (1024x1024)
Transparent-background outputs for logos, stickers, and product cutouts
Output formats: PNG (sharpest text), WebP (smaller files), JPEG (universal)
Moderation controls: auto default, low for permitted use cases
Native multi-constraint prompt adherence at ~98% accuracy

Best Use Cases

GPT Image 2 is the right choice anytime legible text is part of the image: magazine covers with headlines, product packaging mockups, storefront and signboard scenes, infographics and charts, storyboards and comic panels, multilingual ad creatives, UI screen mockups, and posters. In testing, it rendered a handwritten chalkboard easel combining English ("Mumbai Book Store", "Open Daily 9 am – 9 pm") and Hindi Devanagari ("मुंबई पुस्तक भंडार") cleanly on the first try. Edit mode (passing an image input) is ideal for relighting, background swaps, text changes on existing visuals, and brand-consistent variations.

Prompt Tips and Output Quality

Keep quality=high whenever typography matters — medium and low degrade fine lettering. Lead the prompt with subject, then typography in quotes, then style and lighting cues. For magazine-style layouts pick 1024x1536; for marketing banners and scenes, 1536x1024. Use background=transparent for product shots that will be composited downstream. Keep output_format=png and output_compression=100 when text crispness is non-negotiable.

FAQs

Does GPT Image 2 render text in Hindi, Japanese, and Chinese? Yes. Multilingual typography is the model's flagship capability — Devanagari, CJK, Korean, and Bengali all render cleanly enough to ship.

What is the difference between generation and edit mode? Leaving the image parameter null generates from text alone. Passing an image URL switches the model into edit mode, where the prompt guides modifications to the input.

What output sizes are supported? 1024x1024, 1536x1024, 1024x1536, and auto. All run up to 2K resolution with high quality.

When should I use background=transparent? For logos, stickers, icon sets, and product cutouts that will be composited against other backgrounds.

Is GPT Image 2 faster than gpt-image-1.5? Yes — roughly 2× faster thanks to a new single-pass architecture, with fewer artifacts on hands, faces, and material surfaces.

Where does GPT Image 2 fall short? Physical reasoning tasks (origami, angled reflections, Rubik's cubes) and highly dense repetitive detail (circuit diagrams, grains of sand) still challenge the model. Iterative edits beyond one or two passes tend to drift.

Popular Models

SDXL Img2Img SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

IDM VTON Best-in-class clothing virtual try on in the wild

Faceswap V2 Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

Stable Diffusion XL 1.0 The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software