POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/kling-v2-standard-avatar";

const data = {
  "image_url": "https://segmind-resources.s3.amazonaws.com/input/74bce37e-151d-4923-a56f-b5f9ce6e5134-601140c8-73e5-4490-8911-e6c7d3dc0e70-infinite_talk_ip.webp",
  "audio_url": "https://segmind-resources.s3.amazonaws.com/input/ce1dcce7-c5b1-4cf6-a42f-65bed682a44a-news-reading-small.mp3",
  "prompt": "news reporter speaking"
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

image_urlstr *

URL for video background; use high-res for quality. Ideal for both formal and vibrant settings.

audio_urlstr *

URL for audio background; choose clear audio. Great for instructional or narrative content.

promptstr ( default: 1 )

Direction for content creation; specify tone and style. Useful for marketing or storytelling tasks.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Kling Avatar v2 Standard: Image-to-Video Avatar Model

What is Kling Avatar v2 Standard?

Kling Avatar v2 Standard is a generative avatar model that creates a talking-head style video from a source image and a driving audio track. You provide an image_url (the avatar/background frame) and an audio_url (speech or narration), and the model renders a video with lip-sync and expressive facial motion aligned to the audio. An optional prompt lets you steer tone, vibe, and overall presentation.

This model is built for teams who need fast, repeatable avatar video generation for product updates, instructional content, brand messaging, and social clips—without a full production pipeline. It’s especially useful when you want consistent on-camera delivery across many iterations (e.g., weekly release updates).

Key Features

Image + audio conditioned video generation (avatar-driven video)
Lip-sync aligned to narration from audio_url
Creative direction via prompt for tone, mood, and scene styling
Simple API surface: only three inputs (image_url, audio_url, prompt)
Works well for marketing, storytelling, and explainer video workflows
Optimized for clear, structured narration and presenter-style framing

Best Use Cases

Product launches and release announcement videos
How-to guides, onboarding, and micro-learning modules
Creator workflows: narrated shorts, explainers, and storytime content
Internal communications: CEO updates, weekly summaries, team news
Multilingual narration (swap audio_url while keeping the same avatar image)

Prompt Tips and Output Quality

Start with the essentials: speaker style + setting + pacing.
Example: “Professional, friendly presenter in a modern studio; concise delivery; upbeat.”
For stronger results, keep the input image front-facing, well-lit, high resolution, with a clear face region.
Use clean audio (minimal noise, consistent volume). Clear enunciation improves lip-sync quality.
Prompts work best as direction, not scripts—use the audio as the source of spoken content.

FAQs

Is Kling Avatar v2 Standard text-to-video?
Not directly. It’s primarily image-to-video with audio-driven animation; prompt guides style.

What parameters can I tweak?
image_url (visual quality), audio_url (lip-sync driver), and optional prompt (tone/style).

How is it different from general video generation models?
It’s optimized for talking avatar output—stable identity and speech alignment—rather than open-ended scenes.

What image works best?
High-res, front-facing portraits with neutral background and good lighting.

Can I use it for tutorials or marketing videos?
Yes—common search workflows include “AI avatar generator,” “lip sync video,” and “image and audio to video avatar.”

Popular Models

SDXL Img2Img SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

SadTalker Audio-based Lip Synchronization for Talking Head Video

Codeformer CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.

Faceswap Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training