YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SAM3 OpenVINO Models

Pre-exported OpenVINO variants of SAM3.1 for efficient CPU and GPU inference.

Available Variants

Variant Description Size
openvino-fp16 FP16 β€” recommended for GPU (Arc, Intel iGPU) 1.63 GB
openvino-fp32 FP32 β€” reference precision for CPU 3.25 GB
openvino-int8_sym INT8 symmetric β€” CPU-optimized via VNNI ~0.83 GB
openvino-int8_asym INT8 asymmetric β€” CPU-optimized via VNNI ~0.83 GB
openvino-int4_sym INT4 symmetric β€” ultra-low memory CPU ~0.45 GB
openvino-int4_asym INT4 asymmetric β€” ultra-low memory CPU ~0.45 GB
openvino-int8_w8a16 W8A16 weight-only β€” recommended for both CPU and GPU 0.84 GB

Benchmark Results

Intel Arc B60 GPU (XPU) β€” text mode, ms/image ↓

Variant Potatoes Candies Nuts LED Mean vs FP16
FP16 473 450 488 398 452 1.00Γ—
W8A16 495 474 512 419 475 0.96Γ—
PTQ W8A8 (GPU) ~645 ~620 ~670 ~550 ~621 0.73Γ—

Intel Arc B60 GPU (XPU) β€” canvas mode, ms/image ↓

Variant Potatoes Candies Nuts LED Mean vs FP16
FP16 638 752 449 535 593 1.00Γ—
W8A16 657 773 472 556 614 0.96Γ—

CPU (Intel host) β€” text mode, ms/image ↓

Variant Potatoes Candies Mean vs FP16 F1 Potatoes F1 Candies
FP32 8,525 8,407 8,466 1.03Γ— 1.000 0.992
FP16 8,862 8,648 8,755 1.00Γ— 1.000 0.992
INT8 sym 5,270 5,170 5,220 1.67Γ— 1.000 0.992 βœ“
INT8 asym 5,360 5,213 5,286 1.66Γ— 1.000 0.992 βœ“
W8A16 5,228 5,183 5,205 1.68Γ— 1.000 0.992 βœ“
INT4 sym 5,526 5,450 5,488 1.59Γ— 0.963 0.792 ⚠️
INT4 asym 5,599 5,485 5,542 1.58Γ— 1.000 0.939 ⚠️

Quality (F1@0.5) β€” GPU

Dataset FP16 (text) W8A16 (text) FP16 (canvas) W8A16 (canvas)
Potatoes 1.000 1.000 1.000 1.000
Candies 0.992 0.992 0.991 0.991
Nuts 0.698 0.703 0.874 0.881
LED 0.000* 0.000* 0.720 0.724

* LED dataset uses visual-only prompts; F1=0.000 in text mode is expected (no text labels provided).

Quantization Summary β€” Intel Arc B60 GPU

Approach Avg text (ms) Avg canvas (ms) vs FP16 Notes
FP16 (baseline) 452 593 1.00Γ— Native GPU precision
PTQ W8A8 (all layers) ~640 ~780 0.73Γ— Q/DQ ops dominate on Battlemage
PTQ W8A8 (VE only) ~650 ~760 0.72Γ— Mixed-precision boundary overhead
W8A16 (weight-only) 475 614 0.96Γ— No activation Q/DQ; near-lossless

Why PTQ is slower on Arc B60: The FP16 DPAS (Dot Product Accumulate Systolic) units on Intel Battlemage are highly optimized for FP16. NNCF PTQ inserts explicit Q/DQ activation nodes at every layer boundary; those extra GPU kernel launches exceed the memory savings from 2Γ— smaller weights. W8A16 (weight-only) has no such nodes β€” weights are dequantized on-the-fly in a fused kernel β€” giving only ~4% overhead.

Why INT8 is faster on CPU: VNNI instructions (_mm512_dpbusd_epi32) natively accelerate INT8 dot products with no separate Q/DQ overhead. The OV CPU plugin fuses dequantize into the matmul kernel, giving 1.67Γ— speedup with zero accuracy loss.

When to Use Each Variant

Target hardware Recommended variant Reason
Intel Arc GPU (Battlemage B-series) openvino-fp16 Fastest; native FP16 DPAS
Intel Arc GPU (memory-constrained) openvino-int8_w8a16 2Γ— smaller, only 4% slower
CPU (any Intel) openvino-int8_w8a16 1.68Γ— faster than FP16, lossless
CPU (ultra-low memory) openvino-int4_sym Smallest, but F1 may drop on some datasets
Intel Arc A-series / older iGPU openvino-int8_sym Older arch without optimized FP16

Hardware & Software

  • GPU benchmark: Intel Arc B60 (XPU), OpenVINO 2025.x
  • CPU benchmark: Intel host CPU, OpenVINO 2025.x
  • NNCF version: 3.1.0 (W8A16: nncf.compress_weights, PTQ: nncf.quantize)
  • Datasets: COCO (Potatoes 10 imgs / Candies 12 imgs / Nuts 21 imgs / LED 15 imgs)

Usage

from instantlearn.models.sam3 import SAM3OV
from instantlearn.models.sam3.sam3_openvino import SAM3OVVariant

# FP16 β€” fastest on Arc GPU
model = SAM3OV(variant=SAM3OVVariant.FP16, device="GPU")

# W8A16 β€” recommended for CPU or memory-constrained GPU
model = SAM3OV(variant=SAM3OVVariant.INT8_W8A16, device="CPU")
model = SAM3OV(variant=SAM3OVVariant.INT8_W8A16, device="GPU")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support