sam3.1-bf16
This model was converted to MLX format from facebook/sam3.1
using mlx-vlm version 0.4.3.
Open-vocabulary object detection, instance segmentation, and video tracking with Object Multiplex on Apple Silicon (~873M parameters).
SAM 3.1 extends SAM 3 with:
- MultiplexMaskDecoder: processes 16 objects simultaneously (2.4-4x faster tracking)
- TriViTDetNeck: 3 parallel FPN heads (detection, interactive, propagation)
- DecoupledMemoryAttention: image cross-attention with RoPE
- Improved detection accuracy (0.90 vs 0.87 on cats benchmark)
Quick Start
pip install mlx-vlm>=0.4.3
from PIL import Image
from mlx_vlm.utils import load_model, get_model_path
from mlx_vlm.models.sam3.generate import Sam3Predictor
from mlx_vlm.models.sam3_1.processing_sam3_1 import Sam31Processor
model_path = get_model_path("mlx-community/sam3.1-bf16")
model = load_model(model_path)
processor = Sam31Processor.from_pretrained(str(model_path))
predictor = Sam3Predictor(model, processor, score_threshold=0.3)
Object Detection
image = Image.open("photo.jpg")
result = predictor.predict(image, text_prompt="a dog")
for i in range(len(result.scores)):
x1, y1, x2, y2 = result.boxes[i]
print(f"[{result.scores[i]:.2f}] box=({x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f})")
Instance Segmentation
result = predictor.predict(image, text_prompt="a person")
# result.boxes -> (N, 4) xyxy bounding boxes
# result.masks -> (N, H, W) binary segmentation masks
# result.scores -> (N,) confidence scores
import numpy as np
overlay = np.array(image).copy()
W, H = image.size
for i in range(len(result.scores)):
mask = result.masks[i]
if mask.shape != (H, W):
mask = np.array(Image.fromarray(mask.astype(np.float32)).resize((W, H)))
binary = mask > 0
overlay[binary] = (overlay[binary] * 0.5 + np.array([255, 0, 0]) * 0.5).astype(np.uint8)
Multi-Prompt Detection
from mlx_vlm.models.sam3_1.generate import predict_multi
result = predict_multi(predictor, image, ["a cat", "a remote control"])
for i in range(len(result.scores)):
x1, y1, x2, y2 = result.boxes[i]
print(f"[{result.scores[i]:.2f}] {result.labels[i]} box=({x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f})")
Box-Guided Detection
import numpy as np
boxes = np.array([[100, 50, 400, 350]]) # xyxy pixel coords
result = predictor.predict(image, text_prompt="a cat", boxes=boxes)
CLI
# Object detection
python -m mlx_vlm.models.sam3_1.generate --task detect --image photo.jpg --prompt "a cat" --model mlx-community/sam3.1-bf16
# Instance segmentation
python -m mlx_vlm.models.sam3_1.generate --image photo.jpg --prompt "a cat" --model mlx-community/sam3.1-bf16
# Video tracking
python -m mlx_vlm.models.sam3_1.generate --task track --video input.mp4 --prompt "a car" --model mlx-community/sam3.1-bf16
# Real-time webcam (optimized: backbone caching + tracker propagation)
python -m mlx_vlm.models.sam3_1.generate --task realtime --prompt "a person" --model mlx-community/sam3.1-bf16 --resolution 224
| Flag | Default | Description |
|---|---|---|
--task |
segment |
detect, segment, track, realtime |
--prompt |
(required) | Text prompt(s), supports multiple |
--resolution |
1008 |
Input resolution (224 for faster realtime) |
--detect-every |
15 |
Re-run full detection every N frames |
--backbone-every |
30 |
Re-run ViT backbone every N frames |
Benchmarks (M3 Max, bf16)
Detection Accuracy
| Prompt | SAM 3 | SAM 3.1 |
|---|---|---|
| "a cat" (2 cats) | 0.87, 0.82 | 0.90, 0.86 |
| "a remote control" | 0.95, 0.94 | 0.94, 0.94 |
Tracker Multiplex Speed
| Objects | SAM 3 | SAM 3.1 | Speedup |
|---|---|---|---|
| 3 | 547ms/frame | 227ms/frame | 2.4x |
| 4 | 608ms/frame | 203ms/frame | 3.0x |
| 5 | 766ms/frame | 190ms/frame | 4.0x |
Optimized Realtime (224px)
| Metric | Value |
|---|---|
| Cached frame | 38ms (26 FPS) |
| Sustained average | ~40ms (25 FPS) |
| Baseline (no optimization) | ~212ms (5 FPS) |
| Total speedup | 4.6x |
Original Model
License
The original SAM 3.1 model weights are released by Meta under the SAM License, a custom permissive license for commercial and research use.
- Downloads last month
- 106
Model size
0.9B params
Tensor type
F32
·
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for mlx-community/sam3.1-bf16
Base model
facebook/sam3.1