SAM 2.1 Hiera-Tiny — INT8 Quantized

Meta's Segment Anything Model 2.1 (Hiera-Tiny backbone) quantized to INT8 for real-time robotic segmentation. 2.0x smaller — from 298 MB to 152 MB — the smallest SAM2 variant for maximum speed on edge hardware.

This model is part of the RobotFlowLabs model library, built for the ANIMA agentic robotics platform — a modular ROS2-native AI system that brings foundation model intelligence to real robots operating in the real world.

Why This Model Exists

When every millisecond counts — grasping a moving object, dodging an obstacle, responding to a human — you need the fastest possible segmentation. SAM2 Hiera-Tiny is the lightest SAM2 backbone, and at 152 MB after INT8 quantization, it fits comfortably alongside multiple other perception models on devices like Jetson Nano or Orin NX.

Model Details

Property	Value
Architecture	Hiera-Tiny vision backbone + SAM2 decoder
Input Resolution	1024 × 1024
Capabilities	Image segmentation, video object tracking
Backbone Stages	4 stages: [1, 2, 7, 2] blocks (12 total)
Embed Dims	[96, 192, 384, 768] per stage
Attention Heads	[1, 2, 4, 8] per stage
Global Attention	Blocks 5, 7, 9
Mask Decoder	256-dim hidden, 8 attention heads, 3 multi-mask outputs
Memory Attention	4 layers, 2048-dim FFN, RoPE positional encoding
Memory Bank	7 frames temporal context
Original Model	`facebook/sam2.1-hiera-tiny`
License	Apache-2.0

Compression Results

Quantized on an NVIDIA L4 24GB GPU using INT8 dynamic quantization with SafeTensors export.

Metric	Original	INT8 Quantized	Change
Total Size	298 MB	152 MB	2.0x smaller
INT8 Weights	—	32 MB	Quantized linear layers
SafeTensors	—	120 MB	Full model weights
Quantization	FP32	INT8 Dynamic	Per-tensor symmetric
Format	PyTorch	SafeTensors + INT8 .pt	Dual format

Why SafeTensors instead of ONNX? SAM2 uses custom CUDA operations (roi_align, deformable attention) that aren't supported by the ONNX standard. SafeTensors provides fast, safe loading directly into PyTorch with zero-copy memory mapping.

Included Files

sam2.1-hiera-tiny-int8/
├── model_int8.pt              # 32 MB — INT8 quantized state dict
├── model.safetensors          # 120 MB — Full model in SafeTensors format
├── config.json                # Model configuration
├── preprocessor_config.json   # Image preprocessing config
└── README.md                  # This file

Quick Start

PyTorch (SafeTensors)

from transformers import Sam2Model, Sam2Processor
import torch

# Load with SafeTensors (automatic)
model = Sam2Model.from_pretrained("robotflowlabs/sam2.1-hiera-tiny-int8")
processor = Sam2Processor.from_pretrained("facebook/sam2.1-hiera-tiny")

model.to("cuda").eval()

# Segment with point prompt
inputs = processor(
    images=image,
    input_points=[[[500, 375]]],  # (x, y) point prompt
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    outputs = model(**inputs)

masks = processor.post_process_masks(
    outputs.pred_masks,
    inputs["original_sizes"],
    inputs["reshaped_input_sizes"]
)

INT8 Weights (Maximum Compression)

import torch
from transformers import Sam2Model

# Load architecture, then apply INT8 weights
model = Sam2Model.from_pretrained("facebook/sam2.1-hiera-tiny")
int8_state = torch.load("model_int8.pt", map_location="cuda", weights_only=True)
model.load_state_dict(int8_state, strict=False)

With FORGE (ANIMA Integration)

from forge.vision import VisionEncoderRegistry

# FORGE handles optimal loading and batching
segmenter = VisionEncoderRegistry.load("sam2.1-hiera-tiny-int8")
masks = segmenter.segment(image, points=[[500, 375]])

Use Cases in ANIMA

SAM2-Tiny is optimized for latency-critical deployments:

Real-Time Grasping — Fastest segmentation for time-critical manipulation
Mobile Robots — Lightweight enough for Jetson Nano-class devices
Multi-Model Stacking — Leaves maximum VRAM for other perception models
Video Tracking — Track objects across frames with 7-frame temporal memory
High-Frequency Control — Segmentation at camera framerate for reactive behavior

SAM2 Model Family

We provide all three SAM2.1 variants, optimized for different deployment scenarios:

Model	Size	Speed	Best For
sam2.1-hiera-large-int8	1.0 GB	Highest quality	Research, high-accuracy tasks
sam2.1-hiera-small-int8	186 MB	Balanced	Production robotics
sam2.1-hiera-tiny-int8	152 MB	Fastest	Real-time edge, Jetson Nano

Intended Use

Designed For

Lowest-latency segmentation in robotic control loops
Edge devices with limited VRAM (Jetson Nano, Orin NX)
Multi-model inference stacks where VRAM is shared
Real-time video object tracking

Limitations

Smaller backbone means lower accuracy on complex scenes vs Large/Small variants
INT8 quantization may slightly reduce mask boundary precision
Requires a prompt (point, box, or mask) — not a panoptic segmenter
Inherits biases from SA-V dataset

Out of Scope

Medical image segmentation without domain-specific validation
Autonomous driving perception
Surveillance or tracking of individuals

Technical Details

Compression Pipeline

Original SAM2.1 Hiera-Tiny (FP32, 298 MB)
    │
    ├─→ torchao INT8 dynamic quantization (GPU-native)
    │   └─→ model_int8.pt (32 MB)
    │
    └─→ SafeTensors export (roi_align not ONNX-compatible)
        └─→ model.safetensors (120 MB)

Quantization: INT8 dynamic activation + INT8 weight via torchao on NVIDIA L4 GPU
Export: SafeTensors format — zero-copy memory mapping, fast loading
Why not ONNX: SAM2's roi_align and deformable attention are custom CUDA ops
Hardware: NVIDIA L4 24GB, CUDA 13.0, PyTorch 2.10, Python 3.14

Attribution

Original Model: facebook/sam2.1-hiera-tiny by Meta AI (FAIR)
License: Apache-2.0 — free for commercial and research use
Paper: SAM 2: Segment Anything in Images and Videos — Ravi et al., 2024
Dataset: SA-V — 50.9K videos, 642.6K masklets
Compressed by: RobotFlowLabs using FORGE

Citation

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolber, Chloe and Gustafson, Laura and others},
  journal={arXiv preprint arXiv:2408.00714},
  year={2024}
}

@misc{robotflowlabs2026anima,
  title={ANIMA: Agentic Networked Intelligence for Modular Autonomy},
  author={RobotFlowLabs},
  year={2026},
  url={https://huggingface.co/robotflowlabs}
}

Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.