SAM3 OpenVINO Models

Pre-exported OpenVINO variants of SAM3.1 for efficient CPU and GPU inference.

Available Variants

Variant	Description	Size
`openvino-fp16`	FP16 — recommended for GPU (Arc, Intel iGPU)	1.63 GB
`openvino-fp32`	FP32 — reference precision for CPU	3.25 GB
`openvino-int8_sym`	INT8 symmetric — CPU-optimized via VNNI	~0.83 GB
`openvino-int8_asym`	INT8 asymmetric — CPU-optimized via VNNI	~0.83 GB
`openvino-int4_sym`	INT4 symmetric — ultra-low memory CPU	~0.45 GB
`openvino-int4_asym`	INT4 asymmetric — ultra-low memory CPU	~0.45 GB
`openvino-int8_w8a16`	W8A16 weight-only — recommended for both CPU and GPU	0.84 GB

Benchmark Results

Intel Arc B60 GPU (XPU) — text mode, ms/image ↓

Variant	Potatoes	Candies	Nuts	LED	Mean	vs FP16
FP16	473	450	488	398	452	1.00×
W8A16	495	474	512	419	475	0.96×
PTQ W8A8 (GPU)	~645	~620	~670	~550	~621	0.73×

Intel Arc B60 GPU (XPU) — canvas mode, ms/image ↓

Variant	Potatoes	Candies	Nuts	LED	Mean	vs FP16
FP16	638	752	449	535	593	1.00×
W8A16	657	773	472	556	614	0.96×

CPU (Intel host) — text mode, ms/image ↓

Variant	Potatoes	Candies	Mean	vs FP16	F1 Potatoes	F1 Candies
FP32	8,525	8,407	8,466	1.03×	1.000	0.992
FP16	8,862	8,648	8,755	1.00×	1.000	0.992
INT8 sym	5,270	5,170	5,220	1.67×	1.000	0.992 ✓
INT8 asym	5,360	5,213	5,286	1.66×	1.000	0.992 ✓
W8A16	5,228	5,183	5,205	1.68×	1.000	0.992 ✓
INT4 sym	5,526	5,450	5,488	1.59×	0.963	0.792 ⚠️
INT4 asym	5,599	5,485	5,542	1.58×	1.000	0.939 ⚠️

Quality (F1@0.5) — GPU

Dataset	FP16 (text)	W8A16 (text)	FP16 (canvas)	W8A16 (canvas)
Potatoes	1.000	1.000	1.000	1.000
Candies	0.992	0.992	0.991	0.991
Nuts	0.698	0.703	0.874	0.881
LED	0.000*	0.000*	0.720	0.724

* LED dataset uses visual-only prompts; F1=0.000 in text mode is expected (no text labels provided).

Quantization Summary — Intel Arc B60 GPU

Approach	Avg text (ms)	Avg canvas (ms)	vs FP16	Notes
FP16 (baseline)	452	593	1.00×	Native GPU precision
PTQ W8A8 (all layers)	~640	~780	0.73×	Q/DQ ops dominate on Battlemage
PTQ W8A8 (VE only)	~650	~760	0.72×	Mixed-precision boundary overhead
W8A16 (weight-only)	475	614	0.96×	No activation Q/DQ; near-lossless

Why PTQ is slower on Arc B60: The FP16 DPAS (Dot Product Accumulate Systolic) units on Intel Battlemage are highly optimized for FP16. NNCF PTQ inserts explicit Q/DQ activation nodes at every layer boundary; those extra GPU kernel launches exceed the memory savings from 2× smaller weights. W8A16 (weight-only) has no such nodes — weights are dequantized on-the-fly in a fused kernel — giving only ~4% overhead.

Why INT8 is faster on CPU: VNNI instructions (_mm512_dpbusd_epi32) natively accelerate INT8 dot products with no separate Q/DQ overhead. The OV CPU plugin fuses dequantize into the matmul kernel, giving 1.67× speedup with zero accuracy loss.

When to Use Each Variant

Target hardware	Recommended variant	Reason
Intel Arc GPU (Battlemage B-series)	`openvino-fp16`	Fastest; native FP16 DPAS
Intel Arc GPU (memory-constrained)	`openvino-int8_w8a16`	2× smaller, only 4% slower
CPU (any Intel)	`openvino-int8_w8a16`	1.68× faster than FP16, lossless
CPU (ultra-low memory)	`openvino-int4_sym`	Smallest, but F1 may drop on some datasets
Intel Arc A-series / older iGPU	`openvino-int8_sym`	Older arch without optimized FP16

Hardware & Software

GPU benchmark: Intel Arc B60 (XPU), OpenVINO 2025.x
CPU benchmark: Intel host CPU, OpenVINO 2025.x
NNCF version: 3.1.0 (W8A16: nncf.compress_weights, PTQ: nncf.quantize)
Datasets: COCO (Potatoes 10 imgs / Candies 12 imgs / Nuts 21 imgs / LED 15 imgs)

Usage

from instantlearn.models.sam3 import SAM3OV
from instantlearn.models.sam3.sam3_openvino import SAM3OVVariant

# FP16 — fastest on Arc GPU
model = SAM3OV(variant=SAM3OVVariant.FP16, device="GPU")

# W8A16 — recommended for CPU or memory-constrained GPU
model = SAM3OV(variant=SAM3OVVariant.INT8_W8A16, device="CPU")
model = SAM3OV(variant=SAM3OVVariant.INT8_W8A16, device="GPU")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support