SAM3 OpenVINO Models

Pre-exported OpenVINO IR variants of SAM3.1 for efficient CPU and GPU inference.

Origin & License

These models are derived from Meta's SAM 3.1 (facebook/sam3), and therefore follow the same license under the SAM License The OpenVINO IR exports and quantized variants in this repository are derivative works of the original SAM 3.1 model weights and are subject to the same SAM License terms. No model architecture was modified β€” only the format (PyTorch β†’ ONNX β†’ OpenVINO IR) and optional weight compression were applied.

Available Variants

Variant Description Size
openvino-fp16 FP16 β€” recommended for GPU 1.63 GB
openvino-fp32 FP32 β€” reference precision for CPU 3.25 GB
openvino-int8_sym INT8 symmetric β€” CPU-optimized ~0.83 GB
openvino-int8_asym INT8 asymmetric β€” CPU-optimized ~0.83 GB
openvino-int4_sym INT4 symmetric β€” ultra-low memory CPU ~0.45 GB
openvino-int4_asym INT4 asymmetric β€” ultra-low memory CPU ~0.45 GB
openvino-int8_w8a16 W8A16 weight-only β€” recommended for both CPU and GPU 0.84 GB
openvino-int8_ptq_gpu W8A8 post-training quantization β€” best CPU throughput ~0.83 GB

Benchmark Results

GPU (XPU) β€” text mode, ms/image ↓

Variant Potatoes Candies Nuts LED Mean vs FP16
FP16 473 450 488 398 452 1.00Γ—
W8A16 495 474 512 419 475 0.96Γ—
PTQ W8A8 ~645 ~620 ~670 ~550 ~621 0.73Γ—

GPU (XPU) β€” canvas mode, ms/image ↓

Variant Potatoes Candies Nuts LED Mean vs FP16
FP16 638 752 449 535 593 1.00Γ—
W8A16 657 773 472 556 614 0.96Γ—

CPU β€” text mode, ms/image ↓

Variant Potatoes Candies Mean vs FP16 F1 Potatoes F1 Candies
FP32 8,525 8,407 8,466 1.03Γ— 1.000 0.992
FP16 8,862 8,648 8,755 1.00Γ— 1.000 0.992
INT8 sym 5,270 5,170 5,220 1.67Γ— 1.000 0.992 βœ“
INT8 asym 5,360 5,213 5,286 1.66Γ— 1.000 0.992 βœ“
W8A16 5,228 5,183 5,205 1.68Γ— 1.000 0.992 βœ“
INT4 sym 5,526 5,450 5,488 1.59Γ— 0.963 0.792 ⚠️
INT4 asym 5,599 5,485 5,542 1.58Γ— 1.000 0.939 ⚠️

Quality (F1@0.5) β€” GPU

Dataset FP16 (text) W8A16 (text) FP16 (canvas) W8A16 (canvas)
Potatoes 1.000 1.000 1.000 1.000
Candies 0.992 0.992 0.991 0.991
Nuts 0.698 0.703 0.874 0.881
LED 0.000* 0.000* 0.720 0.724

* LED dataset uses visual-only prompts; F1=0.000 in text mode is expected (no text labels provided).

Quantization Summary β€” GPU

Approach Avg text (ms) Avg canvas (ms) vs FP16 Notes
FP16 (baseline) 452 593 1.00Γ— Native GPU precision
PTQ W8A8 (all layers) ~640 ~780 0.73Γ— Q/DQ overhead on FP16-optimized GPU
PTQ W8A8 (VE only) ~650 ~760 0.72Γ— Mixed-precision boundary overhead
W8A16 (weight-only) 475 614 0.96Γ— No activation Q/DQ; near-lossless

Why PTQ is slower on GPU: Modern GPU compute units (e.g., FP16 DPAS) are highly optimized for FP16 math. NNCF PTQ inserts explicit Q/DQ activation nodes at every layer boundary; those extra kernel launches exceed the memory savings from 2Γ— smaller weights. W8A16 (weight-only) has no such nodes β€” weights are dequantized on-the-fly in a fused kernel β€” giving only ~4% overhead.

Why INT8 is faster on CPU: VNNI instructions natively accelerate INT8 dot products with no separate Q/DQ overhead. The OV CPU plugin fuses dequantize into the matmul kernel, giving 1.67Γ— speedup with zero accuracy loss.

When to Use Each Variant

Target hardware Recommended variant Reason
GPU (XPU / discrete) openvino-fp16 Fastest; native FP16 compute
GPU (memory-constrained) openvino-int8_w8a16 2Γ— smaller, only 4% slower
CPU openvino-int8_w8a16 1.68Γ— faster than FP16, lossless
CPU (ultra-low memory) openvino-int4_sym Smallest, but F1 may drop on some datasets
Older iGPU openvino-int8_sym Better for architectures without optimized FP16

Hardware & Software

  • GPU benchmark: Intel Arc discrete GPU (XPU), OpenVINO 2025.x
  • CPU benchmark: Intel CPU, OpenVINO 2025.x
  • NNCF version: 3.1.0 (W8A16: nncf.compress_weights, PTQ: nncf.quantize)
  • Datasets: COCO subsets (Potatoes 10 imgs / Candies 12 imgs / Nuts 21 imgs / LED 15 imgs)

Usage

from instantlearn.models.sam3 import SAM3OpenVINO
from instantlearn.models.sam3.sam3_openvino import SAM3OVVariant

# FP16 β€” fastest on GPU
model = SAM3OpenVINO(variant=SAM3OVVariant.FP16, device="GPU")

# W8A16 β€” recommended for CPU or memory-constrained GPU
model = SAM3OpenVINO(variant=SAM3OVVariant.INT8_W8A16, device="CPU")
model = SAM3OpenVINO(variant=SAM3OVVariant.INT8_W8A16, device="GPU")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rajeshgangireddy/SAM3_OpenVINO

Base model

facebook/sam3.1
Quantized
(6)
this model