POST

javascript

const axios = require('axios');

const fs = require('fs');
const path = require('path');

// helper function to help you convert your local images into base64 format
async function toB64(imgPath) {
    const data = fs.readFileSync(path.resolve(imgPath));
    return Buffer.from(data).toString('base64');
}

const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/gemini-1.5-pro";

const data = {
  "messages": [
    {
      "role": "user",
      "content" : "tell me a joke on cats"
    },
    {
      "role": "assistant",
      "content" : "here is a joke about cats..."
    },
    {
      "role": "user",
      "content" : "now a joke on dogs"
    },
  ]
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

application/json

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

messagesArray

An array of objects containing the role and content

rolestr

Could be "user", "assistant" or "system".

contentstr

A string containing the user's query or the assistant's response.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Gemini 1.5 Pro

Gemini 1.5 Pro is a powerful multimodal large language model from Google DeepMind. It's known for its long-context understanding capability across different formats like text, images, audio and video. Here's a breakdown of its key features:

Long context understanding: Unlike previous models, Gemini 1.5 Pro boasts a massive context window of up to two million tokens, allowing it to process and understand vast amounts of information at once. This could be text documents containing over 700,000 words, hours of audio or video, or codebases with tens of thousands of lines.
Mulitmodal capabilities: It can handle complex reasoning tasks using various data types, including text, images, audio, and video. Imagine showing it a hand-drawn sketch and asking it to identify the scene from a specific movie!
Scalability: Gemini 1.5 Pro is a mid-sized model that excels at handling a wide range of tasks, similar to Google's previous, larger model, 1.0 Ultra. This makes it a versatile tool for various applications.

Overall, Gemini 1.5 Pro represents a significant leap in large language model technology, offering exceptional understanding and performance across different modalities and contexts.

Popular Models

SDXL Img2Img SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

SDXL Controlnet SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process

IDM VTON Best-in-class clothing virtual try on in the wild

Faceswap V2 Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training