1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const axios = require('axios');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/kling-o1";
const data = {
"prompt": "Merge @Image1 into the landscape of @Image2",
"resolution": "1K",
"aspect_ratio": "auto",
"output_format": "png"
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();Describe the desired output using image references like @Image1. Example: 'Combine @Image1 with elements in @Image2'.
List URLs for reference images, using @Image1, @Image2 etc. for adjustments.
Specify the output resolution for images. Opt for '1K' for standard quality.
Allowed values:
Choose frame dimensions; 'auto' for detection or specify ratios like 16:9.
Allowed values:
Select output format. Use 'png' for best quality, 'jpg' for smaller files.
Allowed values:
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Edited by Segmind Team on January 7, 2026.
Kling O1 is a next-generation AI model that unifies video generation and editing within a single Multimodal Visual Language (MVL ) framework. It combines language comprehension, image analysis, motion tracking, and video manipulation in a single cohesive workflow, making it far superior to conventional video AI tools that treat creation and editing as separate processes. It is a perfect model for developers, content creators, and production teams as they can utilize it to produce and edit cinematic-level videos from text prompts, reference images, or motion data, while ensuring consistent characters, stable environments, and smooth camera movement. Furthermore, Kling O1's standout 'Edit mode' empowers users to refine or transform existing footage through simple text commands and visual cues, removing the need for manual masking or tedious frame-by-frame correction.
Reference Image Strategy: Use the @Image notation system effectively: to combine multiple images, specify their relationships clearly, i.e., "Place character from @Image1 in the background environment of @Image2 with lighting from @Image3."
Descriptive Prompts: The prompts should be direct and clear: include camera movements, lighting conditions, and temporal information. Therefore, instead of "person walking," write a detailed prompt such as "medium shot tracking a person walking through a forest at golden hour, camera dollying alongside."
Aspect Ratio Selection: Choose aspect ratios based on the platform where the video has to be shared: 16:9 for YouTube and web content; 9:16 for Instagram Stories and TikTok; 1:1 for feed posts. Additionally, use the "auto" option when working with existing footage to maintain original dimensions.
Resolution & Format: The 1K resolution balances quality with processing speed; select PNG format when preserving fine details matters (final deliverables), and JPG when file size is critical (previews, rapid iterations).
Editing Mode: During the modification of existing videos, describe only the changes you want: "Replace the sky in this footage with sunset colors" rather than writing the entire scene.
Is Kling O1 open-source?
Kling O1 is a proprietary model available through the Segmind API. Though it is not open-source, it offers accessible API access for developers to integrate video generation and editing into their applications.
How does Kling O1 differ from other video generation models?
Kling O1 unifies video generation and editing capabilities through MVL architecture, which makes it a better model than most of the conventional options that only generate video from text or separate editing tools. Therefore, it handles multimodal inputs simultaneously and offers text-based editing without manual masking.
What's the difference between generation mode and edit mode?
Generation mode creates new video content from scratch using prompts and references. Edit mode modifies existing video footage using text descriptions and image references, automatically identifying and adjusting specified elements without requiring manual selection or masking.
How many reference images can I use in a single prompt?
Kling O1 supports multiple images in the prompt using the image_urls parameter and referencing them as @Image1, @Image2, and so on. The model processes these references simultaneously, enabling complex compositions that seamlessly blend characters, environments, and style elements to generate cinematic-quality visuals.
What aspect ratio should I use for different platforms?
The model supports multiple aspect ratios: use 16:9 for YouTube, web video, and horizontal content; 9:16 for Instagram Reels, TikTok, and Stories; 1:1 for Instagram feed posts and LinkedIn. Also, the "auto" setting detects and preserves the aspect ratio of input images or footage.
Can I adjust output quality without changing resolution?
Yes, it is possible to adjust the output quality: choose PNG format for maximum quality with larger file sizes, as it is ideal for final outputs and archival. On the other hand, selecting JPG is recommended for smaller files with minor quality trade-offs, as it is perfect for previews, iterations, and bandwidth-sensitive applications.