1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
const axios = require('axios');
const fs = require('fs');
const path = require('path');
async function toB64(imgPath) {
const data = fs.readFileSync(path.resolve(imgPath));
return Buffer.from(data).toString('base64');
}
const api_key = "YOUR API-KEY";
const url = "https://api.segmind.com/v1/sam3-image";
const data = {
"image": "toB64('https://segmind-resources.s3.amazonaws.com/input/6faa9243-e250-424b-b1b9-c5f1e5e93ab9-sample1.jpg')",
"text_prompt": "plants",
"point_labels_input": "[[1]]",
"return_preview": true,
"return_overlay": false,
"return_masks": false,
"threshold": 0.5,
"points_per_side": 32,
"pred_iou_thresh": 0.88,
"max_masks": 0
};
(async function() {
try {
const response = await axios.post(url, data, { headers: { 'x-api-key': api_key } });
console.log(response.data);
} catch (error) {
console.error('Error:', error.response.data);
}
})();Input image URL or base64 string. Use high-resolution images for better accuracy.
Optional text prompt to guide model focus. Examples: 'animal', 'plant'.
Point coordinates to specify the object for segmentation, ex: [[300, 400]] or [[150, 200], [300, 400]] for multiple points
Labels for each point: 1=foreground, 0=background. Helps refine model predictions.
Bounding box to specify the type of objects to segment. ex: [[100, 150, 200, 250]]
Get combined preview mask. Useful for quick results checks.
Get overlays on input image. Useful for visual assessments
Return each mask separately
Adjust confidence threshold for detection. Use 0.5 for balanced results.
min : 0.1,
max : 1
Density for automatic mask creation. Use higher values for finer details.
min : 0,
max : 128
Set IoU score filter. Higher for stricter quality.
min : 0.5,
max : 1
Limit number of masks returned. 0 for no limit.
min : 0,
max : 100
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Edited by Segmind Team on December 10, 2025.
SAM3 (Segment Anything Model 3) is an advanced model designed for next-generation object segmentation and tracking in both images and videos. It is Meta’s newest foundation AI model, which rides on the success and strengths of SAM2, bringing in robust open-vocabulary capabilities. Using this sophisticated tool, the users can virtually segment any object using natural language prompts, visual cues such as points or bounding boxes, or even mask inputs. SAM3 exceptionally delivers near-human precision on challenging segmentation tasks, offering remarkable versatility through its multi-modal prompting approach; therefore, it can handle complex tasks ranging from developing computer vision systems to annotation pipelines, or advanced video editing tools.
SAM3 is available in three variants: Core SAM3 for concept-driven segmentation, SAM3 Video for temporal object tracking across frames, and SAM3 Tracker for precise instance segmentation with interactive refinement.
Text Prompts: The prompts should have clear, specific nouns (e.g., "truck", "person wearing a hat", "red flower") rather than vague descriptions; the model closely matches all instances of the concept that is described in the prompt.
Point Prompts: Place foreground points (label=1) inside target objects and background points (label=0) outside. Multiple points improve boundary accuracy for ambiguous regions.
Bounding Box Strategy: Define boxes as **[x_min, y_min, x_max, y_max]** coordinates; boxes work best for well-defined objects with clear boundaries.
Tuning Key Parameters:
Combining Prompts: Layer text + boxes + points for maximum control. Start broad with text, refine with boxes, then fine-tune boundaries with points.
Is SAM3 open-source?
SAM3 is developed by Meta AI. You should check Meta's official repository for licensing details; earlier SAM models were released with permissive licenses for research and commercial use.
How does SAM3 differ from SAM2?
SAM3 significantly expands open-vocabulary capabilities, i.e, it can segment objects based on text descriptions more accurately across diverse concepts. It also improves multi-instance detection when multiple objects match your prompt.
What image resolution should I use?
Higher resolutions (1024px+ on the longest side) result in better accuracy with segmentation, especially when the original input includes small objects or fine details. Also, the model automatically handles scaling.
Can I use SAM3 for real-time video processing?
The SAM3 Video variant supports frame-by-frame tracking, but real-time performance depends on hardware and resolution. You may optimize it for desired outputs by batching frames or using lower points_per_side values.
What's the difference between return_preview, return_overlay, and return_masks?
The return_preview option gives a single combined mask image; return_overlay blends masks onto your input for visualization; and return_masks provides separate binary masks for each detected instance. These options support tasks with programmatic post-processing.
Should I use points or boxes for better accuracy?
Boxes are faster and work well for distinct objects. On the other hand, points offer precise control for irregular shapes or when refining specific boundaries. You can effectively combine both options for complex scenes.