1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const axios = require('axios');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/gpt-image-2";
const data = {
"prompt": "A photorealistic, cinematic shot of a cozy independent bookstore in Mumbai at golden hour. Warm afternoon sunlight streams through a tall front window onto wooden shelves packed with books; book spines are clearly visible with titles in English and Hindi Devanagari script. In the foreground, a handwritten chalkboard A-frame easel reads, in clean legible chalk lettering: first line 'मुंबई पुस्तक भंडार', second line 'Mumbai Book Store', and a smaller third line 'Open Daily 9 am – 9 pm'. Shallow depth of field, shot on a full-frame camera at 35mm f/2.0, ultra-realistic detail, natural color grading, dust motes in sunlight, film grain.",
"size": "1536x1024",
"quality": "high",
"moderation": "auto",
"background": "opaque",
"output_compression": 100,
"output_format": "png"
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();Text describing the image; supports in-image typography across scripts. Lead with subject, style, lighting.
Output resolution (WIDTHxHEIGHT). 'auto' lets the model pick. For a custom resolution not listed, set Width and Height instead. Constraints: each edge a multiple of 16, aspect ratio 1:3-3:1, longest edge <= 3840, total pixels 655,360-8,294,400.
Allowed values:
Rendering fidelity; 'high' keeps typography crisp. Use 'medium' or 'low' only for previews.
Allowed values:
Content filter strictness; 'auto' is the safe default. Use 'low' only for permitted use cases.
Allowed values:
'opaque' for full scenes; 'transparent' for logos, stickers, and product cutouts.
Allowed values:
Compression level 0-100; 100 preserves text crispness. Lower values reduce file size.
Use 'png' for crisp text, 'webp' for size, 'jpeg' for broad compatibility.
Allowed values:
A list of reference images. Include one or more URLs to edit or draw context from.
Optional mask image URL for surgical inpainting. White regions of the mask indicate areas to edit; everything outside stays pixel-perfect.
Optional custom width in pixels. Set both Width and Height to override the Size preset. Rules: each a multiple of 16, aspect ratio between 1:3 and 3:1, longest edge <= 3840, total pixels 655,360-8,294,400.
Optional custom height in pixels. Set both Width and Height to override the Size preset. Rules: each a multiple of 16, aspect ratio between 1:3 and 3:1, longest edge <= 3840, total pixels 655,360-8,294,400.
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
GPT Image 2 is OpenAI's next-generation image model, launched in April 2026 as the successor to gpt-image-1.5. It generates photorealistic images from text or edits existing images guided by a prompt, all through a single endpoint. The headline improvement is near-perfect in-image typography: over 95% accuracy across Latin, Japanese, Korean, Chinese, Hindi Devanagari, and Bengali scripts — the first image model practical for shipping UI labels, posters, and multilingual marketing assets without a manual redraw pass. A new single-pass architecture roughly doubles generation speed over the previous version, and built-in reasoning plans composition, counts items, and checks constraints before rendering.
1536x1024), portrait (1024x1536), and square (1024x1024)auto default, low for permitted use casesGPT Image 2 is the right choice anytime legible text is part of the image: magazine covers with headlines, product packaging mockups, storefront and signboard scenes, infographics and charts, storyboards and comic panels, multilingual ad creatives, UI screen mockups, and posters. In testing, it rendered a handwritten chalkboard easel combining English ("Mumbai Book Store", "Open Daily 9 am – 9 pm") and Hindi Devanagari ("मुंबई पुस्तक भंडार") cleanly on the first try. Edit mode (passing an image input) is ideal for relighting, background swaps, text changes on existing visuals, and brand-consistent variations.
Keep quality=high whenever typography matters — medium and low degrade fine lettering. Lead the prompt with subject, then typography in quotes, then style and lighting cues. For magazine-style layouts pick 1024x1536; for marketing banners and scenes, 1536x1024. Use background=transparent for product shots that will be composited downstream. Keep output_format=png and output_compression=100 when text crispness is non-negotiable.
Does GPT Image 2 render text in Hindi, Japanese, and Chinese? Yes. Multilingual typography is the model's flagship capability — Devanagari, CJK, Korean, and Bengali all render cleanly enough to ship.
What is the difference between generation and edit mode?
Leaving the image parameter null generates from text alone. Passing an image URL switches the model into edit mode, where the prompt guides modifications to the input.
What output sizes are supported?
1024x1024, 1536x1024, 1024x1536, and auto. All run up to 2K resolution with high quality.
When should I use background=transparent?
For logos, stickers, icon sets, and product cutouts that will be composited against other backgrounds.
Is GPT Image 2 faster than gpt-image-1.5? Yes — roughly 2× faster thanks to a new single-pass architecture, with fewer artifacts on hands, faces, and material surfaces.
Where does GPT Image 2 fall short? Physical reasoning tasks (origami, angled reflections, Rubik's cubes) and highly dense repetitive detail (circuit diagrams, grains of sand) still challenge the model. Iterative edits beyond one or two passes tend to drift.