POST

javascript

const axios = require('axios');

const fs = require('fs');
const path = require('path');

async function toB64(imgPath) {
    const data = fs.readFileSync(path.resolve(imgPath));
    return Buffer.from(data).toString('base64');
}

const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/ominicontrol";

const data = {
  "image": "toB64('https://segmind-sd-models.s3.us-east-1.amazonaws.com/display_images/Object+24.png')",
  "prompt": "photo of this orange sofa in a modern living room",
  "steps": 8,
  "seed": 4710825087,
  "image_format": "png",
  "image_quality": 90,
  "base64": false
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

imageimage *

URL of the input image.

promptstr *

Prompt for the image generation.

stepsint ( default: 8 )

Number of inference steps for image generation.

min : 4,

max : 40

seedint ( default: 12467 )

Random seed for generation

image_formatenum:str ( default: png )

Output image format

Allowed values:

image_qualityint ( default: 95 )

Image quality setting for output

min : 10,

max : 100

base64boolean ( default: 1 )

Base64 encoding of the output image.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

OminiControl

OminiControl is a cutting-edge framework designed to enhance the capabilities of Diffusion Transformer (DiT) models for image generation tasks. This model stands out due to its parameter efficiency and universal control features, making it suitable for a wide range of image conditioning tasks.

Key Features of OminiControl

Minimal Architectural Changes: OminiControl achieves its functionality with only 0.1% additional parameters compared to traditional methods, significantly reducing the complexity associated with model modifications.
Unified Control Mechanism: The framework integrates various image conditioning tasks—such as subject-driven generation and spatially-aligned conditions (e.g., edges and depth)—into a single model architecture, allowing for versatile applications without the need for separate modules.
Parameter Reuse Mechanism: By leveraging existing components within the DiT architecture, OminiControl minimizes the need for additional control modules, which are common in other frameworks like ControlNet and T2I-Adapter.

Technical Innovations of OminiControl

Multi-Modal Attention Processing: OminiControl utilizes a multi-modal attention mechanism that allows for flexible interactions between condition tokens and noisy image tokens. This approach facilitates both spatially aligned and non-aligned tasks without rigid spatial constraints.
Dynamic Positioning Strategy: The model employs a dynamic positioning strategy for condition tokens, which adjusts based on whether the task is spatially aligned or not. This flexibility enhances performance across diverse generation scenarios.
Automated Data Synthesis Pipeline: To support its training, OminiControl introduces a novel data synthesis pipeline that generates high-quality, identity-consistent images. This pipeline has produced the Subjects200K dataset, comprising over 200,000 images tailored for subject-driven generation tasks.

Benefits of Using OminiControl

OminiControl excels in generating images based on specific subjects. This capability is particularly useful in industries such as advertising and media, where personalized content is essential..
The model supports advanced image editing tasks, including: Filling in missing parts of an image seamlessly, Creating images that adhere to specified edge outlines, useful in graphic design and illustration and Changing or enhancing backgrounds while preserving the integrity of the main subjects.

Popular Models

Llama 3 8b Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.

SDXL Inpaint This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

Stable Diffusion XL 1.0 The SDXL model is the official upgrade to the v1.5 model. The model is released as open-source software

Faceswap Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training