POST
javascript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 const axios = require('axios'); const fs = require('fs'); const path = require('path'); // helper function to help you convert your local images into base64 format async function toB64(imgPath) { const data = fs.readFileSync(path.resolve(imgPath)); return Buffer.from(data).toString('base64'); } const api_key = "YOUR API-KEY"; const url = "https://api.segmind.com/v1/qwen2-vl-7b-instruct"; const data = { "messages": [ { "role": "user", "content" : "tell me a joke on cats" }, { "role": "assistant", "content" : "here is a joke about cats..." }, { "role": "user", "content" : "now a joke on dogs" }, ] }; (async function() { try { const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } }); console.log(response.data); } catch (error) { console.error('Error:', error.response.data); } })();
RESPONSE
application/json
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


messagesArray

An array of objects containing the role and content


rolestr

Could be "user", "assistant" or "system".


contentstr

A string containing the user's query or the assistant's response.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Qwen2-VL-7B-Instruct

The Qwen2-VL-7B-Instruct model is a cutting-edge vision-language model from the Qwen family, designed to understand and interact with both visual and textual data. It builds upon the foundation of previous Qwen-VL models and introduces several key enhancements. This model is instruction-tuned and contains 7 billion parameters.

Key Features of Qwen2-VL-7B-Instruct

  • Enhanced Visual Understanding: Qwen2-VL is capable of recognizing common objects like plants, animals, and insects, as well as analyzing text, charts, icons, graphics, and layouts within images

  • Qwen2-VL can generate structured outputs for data like invoices, forms, and tables, which is useful for applications in finance and commerce

  • Object Recognition: The model is proficient in recognizing common objects such as flowers, birds, fish, and insects.

  • Image Analysis: Beyond object recognition, Qwen2-VL can analyze texts, charts, icons, graphics, and layouts within images.

  • The model can act as a visual agent, reasoning and directing tools for computer and phone use

  • The model can accurately locate objects in an image by generating bounding boxes or points and provide stable JSON outputs for coordinates and attributes

  • The model supports a wide range of input resolutions. You can adjust the min_pixels and max_pixels to balance performance and computation cost. You can also directly set the resized_height and resized_width

  • he model shows strong performance on various image and video benchmarks. For example, it achieves a score of 60 on the MMMUval benchmark, 95.7 on the DocVQAtest benchmark, and 69.6 on the MVBench benchmark.

Limitation of Qwen2-VL-7B-Instruct

The Qwen2-VL-7B-Instruct model, while powerful, does have some limitations:

  • Data Timeliness: The image dataset used to train the model is only updated until June 2023. Therefore, information after this date may not be covered by the model.

  • Limited Recognition of Individuals and Intellectual Property (IP): The model has a limited capacity to recognize specific individuals or IPs. It may not be able to identify all well-known personalities or brands.

  • Limited Capacity for Complex Instructions: The model's understanding and execution capabilities may require improvement when faced with intricate, multi-step instructions.

  • Insufficient Counting Accuracy: The model's accuracy in counting objects, especially in complex scenes, is not high.

  • Weak Spatial Reasoning Skills: The model's ability to infer positional relationships between objects, particularly in 3D spaces, is inadequate. It may have difficulty judging the relative positions of objects.

  • YaRN impact: While the model supports the use of YaRN for processing long texts, it has a significant negative impact on the performance of temporal and spatial localization tasks and is not recommended.