1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
const axios = require('axios');
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/qwq-plus";
const data = {
"messages": [
{
"role": "user",
"content" : "tell me a joke on cats"
},
{
"role": "assistant",
"content" : "here is a joke about cats..."
},
{
"role": "user",
"content" : "now a joke on dogs"
},
]
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();An array of objects containing the role and content
Could be "user", "assistant" or "system".
A string containing the user's query or the assistant's response.
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
QwQ Plus is Alibaba Cloud's flagship reasoning language model, built on the QwQ-32B architecture and significantly enhanced through reinforcement learning. Unlike standard language models that respond immediately, QwQ Plus operates in thinking-only mode — it always deliberates internally before generating a final answer. This deep reasoning process makes it exceptionally capable on tasks that require multi-step logic, mathematical derivation, and complex problem decomposition.
With a 131,072-token context window and 32.5 billion parameters, QwQ Plus handles long documents, intricate prompts, and multi-turn reasoning sessions with ease. It achieves benchmark performance comparable to DeepSeek-R1 on AIME 24/25 and LiveCodeBench, making it one of the most capable open-weight reasoning models available via API.
reasoning_content field for full transparency.QwQ Plus is purpose-built for tasks where accuracy and reasoning depth matter more than raw speed:
QwQ Plus works best when prompts are clear and goal-oriented. For mathematical or coding problems, include all relevant context and constraints upfront. Because the model reasons internally, you will receive both a reasoning_content block (the thinking trace) and a content block (the final answer) — use the reasoning trace to audit correctness or understand the model's approach.
Recommended parameters: temperature 0.6, TopP 0.95, presence penalty 0-2. Avoid greedy decoding (temperature 0) which can produce repetitive outputs.
What makes QwQ Plus different from Qwen or GPT-4o? QwQ Plus is a dedicated reasoning model — it always thinks before answering, making it slower but significantly more accurate on hard problems.
Does QwQ Plus support function calling or tool use? QwQ Plus is optimized for deep reasoning text generation. For agentic tool-use workflows, consider pairing it with a planning layer.
What is the context limit? 131,072 tokens — sufficient for long codebases, research papers, and multi-turn conversations.
How is billing calculated?
Input and output tokens are billed separately. Thinking tokens (in reasoning_content) are billed as output tokens.
Is QwQ Plus open source? The underlying QwQ-32B weights are open-source on Hugging Face. QwQ Plus is the production-optimized API version served by Alibaba Cloud via Segmind.
Which model should I use for speed vs. accuracy? For maximum accuracy on complex tasks, use QwQ Plus. For faster responses on simpler queries, consider Qwen3-Plus or a smaller Qwen3 model.