SAM3 OpenVINO Models
Pre-exported OpenVINO IR variants of SAM3.1 for efficient CPU and GPU inference.
Origin & License
These models are derived from Meta's SAM 3.1 (facebook/sam3), and therefore follow the same license under the SAM License
The OpenVINO IR exports and quantized variants in this repository are derivative works of the original SAM 3.1 model weights and are subject to the same SAM License terms. No model architecture was modified β only the format (PyTorch β ONNX β OpenVINO IR) and optional weight compression were applied.
Available Variants
| Variant |
Description |
Size |
openvino-fp16 |
FP16 β recommended for GPU |
1.63 GB |
openvino-fp32 |
FP32 β reference precision for CPU |
3.25 GB |
openvino-int8_sym |
INT8 symmetric β CPU-optimized |
~0.83 GB |
openvino-int8_asym |
INT8 asymmetric β CPU-optimized |
~0.83 GB |
openvino-int4_sym |
INT4 symmetric β ultra-low memory CPU |
~0.45 GB |
openvino-int4_asym |
INT4 asymmetric β ultra-low memory CPU |
~0.45 GB |
openvino-int8_w8a16 |
W8A16 weight-only β recommended for both CPU and GPU |
0.84 GB |
openvino-int8_ptq_gpu |
W8A8 post-training quantization β best CPU throughput |
~0.83 GB |
Benchmark Results
GPU (XPU) β text mode, ms/image β
| Variant |
Potatoes |
Candies |
Nuts |
LED |
Mean |
vs FP16 |
| FP16 |
473 |
450 |
488 |
398 |
452 |
1.00Γ |
| W8A16 |
495 |
474 |
512 |
419 |
475 |
0.96Γ |
| PTQ W8A8 |
~645 |
~620 |
~670 |
~550 |
~621 |
0.73Γ |
GPU (XPU) β canvas mode, ms/image β
| Variant |
Potatoes |
Candies |
Nuts |
LED |
Mean |
vs FP16 |
| FP16 |
638 |
752 |
449 |
535 |
593 |
1.00Γ |
| W8A16 |
657 |
773 |
472 |
556 |
614 |
0.96Γ |
CPU β text mode, ms/image β
| Variant |
Potatoes |
Candies |
Mean |
vs FP16 |
F1 Potatoes |
F1 Candies |
| FP32 |
8,525 |
8,407 |
8,466 |
1.03Γ |
1.000 |
0.992 |
| FP16 |
8,862 |
8,648 |
8,755 |
1.00Γ |
1.000 |
0.992 |
| INT8 sym |
5,270 |
5,170 |
5,220 |
1.67Γ |
1.000 |
0.992 β |
| INT8 asym |
5,360 |
5,213 |
5,286 |
1.66Γ |
1.000 |
0.992 β |
| W8A16 |
5,228 |
5,183 |
5,205 |
1.68Γ |
1.000 |
0.992 β |
| INT4 sym |
5,526 |
5,450 |
5,488 |
1.59Γ |
0.963 |
0.792 β οΈ |
| INT4 asym |
5,599 |
5,485 |
5,542 |
1.58Γ |
1.000 |
0.939 β οΈ |
Quality (F1@0.5) β GPU
| Dataset |
FP16 (text) |
W8A16 (text) |
FP16 (canvas) |
W8A16 (canvas) |
| Potatoes |
1.000 |
1.000 |
1.000 |
1.000 |
| Candies |
0.992 |
0.992 |
0.991 |
0.991 |
| Nuts |
0.698 |
0.703 |
0.874 |
0.881 |
| LED |
0.000* |
0.000* |
0.720 |
0.724 |
* LED dataset uses visual-only prompts; F1=0.000 in text mode is expected (no text labels provided).
Quantization Summary β GPU
| Approach |
Avg text (ms) |
Avg canvas (ms) |
vs FP16 |
Notes |
| FP16 (baseline) |
452 |
593 |
1.00Γ |
Native GPU precision |
| PTQ W8A8 (all layers) |
~640 |
~780 |
0.73Γ |
Q/DQ overhead on FP16-optimized GPU |
| PTQ W8A8 (VE only) |
~650 |
~760 |
0.72Γ |
Mixed-precision boundary overhead |
| W8A16 (weight-only) |
475 |
614 |
0.96Γ |
No activation Q/DQ; near-lossless |
Why PTQ is slower on GPU: Modern GPU compute units (e.g., FP16 DPAS) are highly optimized for FP16 math. NNCF PTQ inserts explicit Q/DQ activation nodes at every layer boundary; those extra kernel launches exceed the memory savings from 2Γ smaller weights. W8A16 (weight-only) has no such nodes β weights are dequantized on-the-fly in a fused kernel β giving only ~4% overhead.
Why INT8 is faster on CPU: VNNI instructions natively accelerate INT8 dot products with no separate Q/DQ overhead. The OV CPU plugin fuses dequantize into the matmul kernel, giving 1.67Γ speedup with zero accuracy loss.
When to Use Each Variant
| Target hardware |
Recommended variant |
Reason |
| GPU (XPU / discrete) |
openvino-fp16 |
Fastest; native FP16 compute |
| GPU (memory-constrained) |
openvino-int8_w8a16 |
2Γ smaller, only 4% slower |
| CPU |
openvino-int8_w8a16 |
1.68Γ faster than FP16, lossless |
| CPU (ultra-low memory) |
openvino-int4_sym |
Smallest, but F1 may drop on some datasets |
| Older iGPU |
openvino-int8_sym |
Better for architectures without optimized FP16 |
Hardware & Software
- GPU benchmark: Intel Arc discrete GPU (XPU), OpenVINO 2025.x
- CPU benchmark: Intel CPU, OpenVINO 2025.x
- NNCF version: 3.1.0 (W8A16:
nncf.compress_weights, PTQ: nncf.quantize)
- Datasets: COCO subsets (Potatoes 10 imgs / Candies 12 imgs / Nuts 21 imgs / LED 15 imgs)
Usage
from instantlearn.models.sam3 import SAM3OpenVINO
from instantlearn.models.sam3.sam3_openvino import SAM3OVVariant
model = SAM3OpenVINO(variant=SAM3OVVariant.FP16, device="GPU")
model = SAM3OpenVINO(variant=SAM3OVVariant.INT8_W8A16, device="CPU")
model = SAM3OpenVINO(variant=SAM3OVVariant.INT8_W8A16, device="GPU")