POST

javascript

const axios = require('axios');


const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/veo-3.1";

const data = {
  "prompt": "A woman is giving a keynote presentation at a tech conference, wearing a sleek white blazer with the Veo 3.1 logo subtly embroidered, highlighted by ambient blue stage lighting. She is on a modern conference stage with geometric patterns and LED screens behind her, discussing how Veo 3.1 can integrate reference images to enhance AI-generated video content. The Veo 3.1 logo is also displayed prominently on a large digital screen behind her",
  "duration": 8,
  "resolution": "1080p",
  "aspect_ratio": "16:9",
  "generate_audio": true,
  "negative_prompt": "no black and white, no sharp angles"
};

(async function() {
    try {
        const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
        console.log(response.data);
    } catch (error) {
        console.error('Error:', error.response.data);
    }
})();

RESPONSE

image/jpeg

HTTP Response Codes

200 - OKImage Generated

401 - UnauthorizedUser authentication failed

404 - Not FoundThe requested URL does not exist

405 - Method Not AllowedThe requested HTTP method is not allowed

406 - Not AcceptableNot enough credits

500 - Server ErrorServer had some issue with processing

Attributes

seedint ( default: 1 )

Sets a random seed for consistency. Useful for repeated generations.

imagestr ( default: 1 )

Start generation from this image. Ideal for specific starting visuals.

promptstr *

Describe the video content. Use clear, concise language for best results.

durationenum:int ( default: 8 )

Choose the video length: short is 4s, medium 6s, long 8s.

Allowed values:

last_framestr ( default: 1 )

End video with this image. Useful for specific concluding visuals.

resolutionenum:str ( default: 1080p )

Set video quality: 720p for standard, 1080p for high resolution.

Allowed values:

aspect_ratioenum:str ( default: 16:9 )

Choose aspect ratio: 16:9 for landscape, 9:16 for portrait mode.

Allowed values:

generate_audiobool ( default: true )

Enable audio generation. Recommended for videos requiring sound.

negative_promptstr ( default: no black and white, no sharp angles )

Exclude elements from video. Useful for refining details.

reference_imagesstr *

Use reference images for consistency. Essential for maintaining subject style.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Veo 3.1: AI Video Generation Model

Edited by Segmind Team on October 22, 2025.

What is Veo 3.1?

Veo 3.1 is a next-generation AI model that creates dynamic videos with synchronized audio from static images. Developed by Google DeepMind, it renders videos with a high degree of realism and precise creative control, the options that empower developers and content creators to create professional-quality visual output effortlessly.

Key Features of Veo 3.1

Flexible Video Generation: It creates videos with customizable resolutions (720p/1080p) that can be 4 to 8 seconds long
Multi-Format Support: It supports the option to select aspect ratios between 16:9 landscape and 9:16 portrait
Advanced Control System: It supports consistent output via start/end frame specifications and reference images
Integrated Audio Generation: It consists of built-in audio synthesis for complete audiovisual experiences
Precise Creative Control: It supports refining the results using negative prompts and seed values
Cross-Platform Availability: It provides access through Flow, Gemini API, or Vertex AI

Best Use Cases

Content Creation: It is ideal to generate engaging social media videos and marketing content
Prototyping: It can be used to quickly visualize motion concepts for UI/UX design
Education: It can create explanatory videos and dynamic presentations
Entertainment: It can develop creative transitions and special effects
E-commerce: It can seamlessly transform product photos into dynamic showcases
Digital Art: It can convert static artwork into animated sequences

Prompt Tips and Output Quality

Provide clear, descriptive prompts that specify action and environment
Use apt reference images to maintain a consistent style and subject within the video
Make use of negative prompts to exclude unwanted elements
For best results:
- Combine specific action descriptions with atmospheric details
- Use duration settings based on complexity; go with a longer duration for complex scenes
- Set resolution based on platform requirements, such as 1080p for professional use

FAQs

How is Veo 3.1 different from other video generation models? Veo 3.1's integrated audio generation, precise frame control, and a holistic video generation with synchronized audio make it a sophisticated model when compared to other options.

What's the optimal way to use reference images? Reference images produce the precise results when they clearly show the subject and style you want to include in the video. Furthermore, supplementing multiple references can guide the model to give you the desired outcome.

Can I control the video's style consistency? Yes, using a combination of reference images, specific prompts, and seed values ensures a consistent style control across multiple generations.

How do I achieve the best video quality? To get the best video quality, select 1080p resolution, provide clear reference images, and detailed prompts. For complex scenes, longer video durations will ensure smooth transitions.

Can I generate videos without audio? Yes, the generate_audio parameter can be turned off when you want a video without any audio.

Popular Models

SDXL Controlnet SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process

Llama 3 70b Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.

Faceswap V2 Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

SDXL Inpaint This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask