SAM3 OpenVINO Models
Pre-exported OpenVINO variants of SAM3.1 for efficient CPU and GPU inference.
Available Variants
| Variant |
Description |
Size |
openvino-fp16 |
FP16 β recommended for GPU (Arc, Intel iGPU) |
1.63 GB |
openvino-fp32 |
FP32 β reference precision for CPU |
3.25 GB |
openvino-int8_sym |
INT8 symmetric β CPU-optimized via VNNI |
~0.83 GB |
openvino-int8_asym |
INT8 asymmetric β CPU-optimized via VNNI |
~0.83 GB |
openvino-int4_sym |
INT4 symmetric β ultra-low memory CPU |
~0.45 GB |
openvino-int4_asym |
INT4 asymmetric β ultra-low memory CPU |
~0.45 GB |
openvino-int8_w8a16 |
W8A16 weight-only β recommended for both CPU and GPU |
0.84 GB |
Benchmark Results
Intel Arc B60 GPU (XPU) β text mode, ms/image β
| Variant |
Potatoes |
Candies |
Nuts |
LED |
Mean |
vs FP16 |
| FP16 |
473 |
450 |
488 |
398 |
452 |
1.00Γ |
| W8A16 |
495 |
474 |
512 |
419 |
475 |
0.96Γ |
| PTQ W8A8 (GPU) |
~645 |
~620 |
~670 |
~550 |
~621 |
0.73Γ |
Intel Arc B60 GPU (XPU) β canvas mode, ms/image β
| Variant |
Potatoes |
Candies |
Nuts |
LED |
Mean |
vs FP16 |
| FP16 |
638 |
752 |
449 |
535 |
593 |
1.00Γ |
| W8A16 |
657 |
773 |
472 |
556 |
614 |
0.96Γ |
CPU (Intel host) β text mode, ms/image β
| Variant |
Potatoes |
Candies |
Mean |
vs FP16 |
F1 Potatoes |
F1 Candies |
| FP32 |
8,525 |
8,407 |
8,466 |
1.03Γ |
1.000 |
0.992 |
| FP16 |
8,862 |
8,648 |
8,755 |
1.00Γ |
1.000 |
0.992 |
| INT8 sym |
5,270 |
5,170 |
5,220 |
1.67Γ |
1.000 |
0.992 β |
| INT8 asym |
5,360 |
5,213 |
5,286 |
1.66Γ |
1.000 |
0.992 β |
| W8A16 |
5,228 |
5,183 |
5,205 |
1.68Γ |
1.000 |
0.992 β |
| INT4 sym |
5,526 |
5,450 |
5,488 |
1.59Γ |
0.963 |
0.792 β οΈ |
| INT4 asym |
5,599 |
5,485 |
5,542 |
1.58Γ |
1.000 |
0.939 β οΈ |
Quality (F1@0.5) β GPU
| Dataset |
FP16 (text) |
W8A16 (text) |
FP16 (canvas) |
W8A16 (canvas) |
| Potatoes |
1.000 |
1.000 |
1.000 |
1.000 |
| Candies |
0.992 |
0.992 |
0.991 |
0.991 |
| Nuts |
0.698 |
0.703 |
0.874 |
0.881 |
| LED |
0.000* |
0.000* |
0.720 |
0.724 |
* LED dataset uses visual-only prompts; F1=0.000 in text mode is expected (no text labels provided).
Quantization Summary β Intel Arc B60 GPU
| Approach |
Avg text (ms) |
Avg canvas (ms) |
vs FP16 |
Notes |
| FP16 (baseline) |
452 |
593 |
1.00Γ |
Native GPU precision |
| PTQ W8A8 (all layers) |
~640 |
~780 |
0.73Γ |
Q/DQ ops dominate on Battlemage |
| PTQ W8A8 (VE only) |
~650 |
~760 |
0.72Γ |
Mixed-precision boundary overhead |
| W8A16 (weight-only) |
475 |
614 |
0.96Γ |
No activation Q/DQ; near-lossless |
Why PTQ is slower on Arc B60: The FP16 DPAS (Dot Product Accumulate Systolic) units on Intel Battlemage are highly optimized for FP16. NNCF PTQ inserts explicit Q/DQ activation nodes at every layer boundary; those extra GPU kernel launches exceed the memory savings from 2Γ smaller weights. W8A16 (weight-only) has no such nodes β weights are dequantized on-the-fly in a fused kernel β giving only ~4% overhead.
Why INT8 is faster on CPU: VNNI instructions (_mm512_dpbusd_epi32) natively accelerate INT8 dot products with no separate Q/DQ overhead. The OV CPU plugin fuses dequantize into the matmul kernel, giving 1.67Γ speedup with zero accuracy loss.
When to Use Each Variant
| Target hardware |
Recommended variant |
Reason |
| Intel Arc GPU (Battlemage B-series) |
openvino-fp16 |
Fastest; native FP16 DPAS |
| Intel Arc GPU (memory-constrained) |
openvino-int8_w8a16 |
2Γ smaller, only 4% slower |
| CPU (any Intel) |
openvino-int8_w8a16 |
1.68Γ faster than FP16, lossless |
| CPU (ultra-low memory) |
openvino-int4_sym |
Smallest, but F1 may drop on some datasets |
| Intel Arc A-series / older iGPU |
openvino-int8_sym |
Older arch without optimized FP16 |
Hardware & Software
- GPU benchmark: Intel Arc B60 (XPU), OpenVINO 2025.x
- CPU benchmark: Intel host CPU, OpenVINO 2025.x
- NNCF version: 3.1.0 (W8A16:
nncf.compress_weights, PTQ: nncf.quantize)
- Datasets: COCO (Potatoes 10 imgs / Candies 12 imgs / Nuts 21 imgs / LED 15 imgs)
Usage
from instantlearn.models.sam3 import SAM3OV
from instantlearn.models.sam3.sam3_openvino import SAM3OVVariant
model = SAM3OV(variant=SAM3OVVariant.FP16, device="GPU")
model = SAM3OV(variant=SAM3OVVariant.INT8_W8A16, device="CPU")
model = SAM3OV(variant=SAM3OVVariant.INT8_W8A16, device="GPU")