SAM 2.1 Hiera-Tiny β INT8 Quantized
Meta's Segment Anything Model 2.1 (Hiera-Tiny backbone) quantized to INT8 for real-time robotic segmentation. 2.0x smaller β from 298 MB to 152 MB β the smallest SAM2 variant for maximum speed on edge hardware.
This model is part of the RobotFlowLabs model library, built for the ANIMA agentic robotics platform β a modular ROS2-native AI system that brings foundation model intelligence to real robots operating in the real world.
Why This Model Exists
When every millisecond counts β grasping a moving object, dodging an obstacle, responding to a human β you need the fastest possible segmentation. SAM2 Hiera-Tiny is the lightest SAM2 backbone, and at 152 MB after INT8 quantization, it fits comfortably alongside multiple other perception models on devices like Jetson Nano or Orin NX.
Model Details
| Property | Value |
|---|---|
| Architecture | Hiera-Tiny vision backbone + SAM2 decoder |
| Input Resolution | 1024 Γ 1024 |
| Capabilities | Image segmentation, video object tracking |
| Backbone Stages | 4 stages: [1, 2, 7, 2] blocks (12 total) |
| Embed Dims | [96, 192, 384, 768] per stage |
| Attention Heads | [1, 2, 4, 8] per stage |
| Global Attention | Blocks 5, 7, 9 |
| Mask Decoder | 256-dim hidden, 8 attention heads, 3 multi-mask outputs |
| Memory Attention | 4 layers, 2048-dim FFN, RoPE positional encoding |
| Memory Bank | 7 frames temporal context |
| Original Model | facebook/sam2.1-hiera-tiny |
| License | Apache-2.0 |
Compression Results
Quantized on an NVIDIA L4 24GB GPU using INT8 dynamic quantization with SafeTensors export.
| Metric | Original | INT8 Quantized | Change |
|---|---|---|---|
| Total Size | 298 MB | 152 MB | 2.0x smaller |
| INT8 Weights | β | 32 MB | Quantized linear layers |
| SafeTensors | β | 120 MB | Full model weights |
| Quantization | FP32 | INT8 Dynamic | Per-tensor symmetric |
| Format | PyTorch | SafeTensors + INT8 .pt | Dual format |
Why SafeTensors instead of ONNX? SAM2 uses custom CUDA operations (roi_align, deformable attention) that aren't supported by the ONNX standard. SafeTensors provides fast, safe loading directly into PyTorch with zero-copy memory mapping.
Included Files
sam2.1-hiera-tiny-int8/
βββ model_int8.pt # 32 MB β INT8 quantized state dict
βββ model.safetensors # 120 MB β Full model in SafeTensors format
βββ config.json # Model configuration
βββ preprocessor_config.json # Image preprocessing config
βββ README.md # This file
Quick Start
PyTorch (SafeTensors)
from transformers import Sam2Model, Sam2Processor
import torch
# Load with SafeTensors (automatic)
model = Sam2Model.from_pretrained("robotflowlabs/sam2.1-hiera-tiny-int8")
processor = Sam2Processor.from_pretrained("facebook/sam2.1-hiera-tiny")
model.to("cuda").eval()
# Segment with point prompt
inputs = processor(
images=image,
input_points=[[[500, 375]]], # (x, y) point prompt
return_tensors="pt"
).to("cuda")
with torch.no_grad():
outputs = model(**inputs)
masks = processor.post_process_masks(
outputs.pred_masks,
inputs["original_sizes"],
inputs["reshaped_input_sizes"]
)
INT8 Weights (Maximum Compression)
import torch
from transformers import Sam2Model
# Load architecture, then apply INT8 weights
model = Sam2Model.from_pretrained("facebook/sam2.1-hiera-tiny")
int8_state = torch.load("model_int8.pt", map_location="cuda", weights_only=True)
model.load_state_dict(int8_state, strict=False)
With FORGE (ANIMA Integration)
from forge.vision import VisionEncoderRegistry
# FORGE handles optimal loading and batching
segmenter = VisionEncoderRegistry.load("sam2.1-hiera-tiny-int8")
masks = segmenter.segment(image, points=[[500, 375]])
Use Cases in ANIMA
SAM2-Tiny is optimized for latency-critical deployments:
- Real-Time Grasping β Fastest segmentation for time-critical manipulation
- Mobile Robots β Lightweight enough for Jetson Nano-class devices
- Multi-Model Stacking β Leaves maximum VRAM for other perception models
- Video Tracking β Track objects across frames with 7-frame temporal memory
- High-Frequency Control β Segmentation at camera framerate for reactive behavior
SAM2 Model Family
We provide all three SAM2.1 variants, optimized for different deployment scenarios:
| Model | Size | Speed | Best For |
|---|---|---|---|
| sam2.1-hiera-large-int8 | 1.0 GB | Highest quality | Research, high-accuracy tasks |
| sam2.1-hiera-small-int8 | 186 MB | Balanced | Production robotics |
| sam2.1-hiera-tiny-int8 | 152 MB | Fastest | Real-time edge, Jetson Nano |
Intended Use
Designed For
- Lowest-latency segmentation in robotic control loops
- Edge devices with limited VRAM (Jetson Nano, Orin NX)
- Multi-model inference stacks where VRAM is shared
- Real-time video object tracking
Limitations
- Smaller backbone means lower accuracy on complex scenes vs Large/Small variants
- INT8 quantization may slightly reduce mask boundary precision
- Requires a prompt (point, box, or mask) β not a panoptic segmenter
- Inherits biases from SA-V dataset
Out of Scope
- Medical image segmentation without domain-specific validation
- Autonomous driving perception
- Surveillance or tracking of individuals
Technical Details
Compression Pipeline
Original SAM2.1 Hiera-Tiny (FP32, 298 MB)
β
βββ torchao INT8 dynamic quantization (GPU-native)
β βββ model_int8.pt (32 MB)
β
βββ SafeTensors export (roi_align not ONNX-compatible)
βββ model.safetensors (120 MB)
- Quantization: INT8 dynamic activation + INT8 weight via
torchaoon NVIDIA L4 GPU - Export: SafeTensors format β zero-copy memory mapping, fast loading
- Why not ONNX: SAM2's roi_align and deformable attention are custom CUDA ops
- Hardware: NVIDIA L4 24GB, CUDA 13.0, PyTorch 2.10, Python 3.14
Attribution
- Original Model:
facebook/sam2.1-hiera-tinyby Meta AI (FAIR) - License: Apache-2.0 β free for commercial and research use
- Paper: SAM 2: Segment Anything in Images and Videos β Ravi et al., 2024
- Dataset: SA-V β 50.9K videos, 642.6K masklets
- Compressed by: RobotFlowLabs using FORGE
Citation
@article{ravi2024sam2,
title={SAM 2: Segment Anything in Images and Videos},
author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolber, Chloe and Gustafson, Laura and others},
journal={arXiv preprint arXiv:2408.00714},
year={2024}
}
@misc{robotflowlabs2026anima,
title={ANIMA: Agentic Networked Intelligence for Modular Autonomy},
author={RobotFlowLabs},
year={2026},
url={https://huggingface.co/robotflowlabs}
}
Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.
- Downloads last month
- 12
Model tree for robotflowlabs/sam2.1-hiera-tiny-int8
Base model
facebook/sam2.1-hiera-tinyCollection including robotflowlabs/sam2.1-hiera-tiny-int8
Paper for robotflowlabs/sam2.1-hiera-tiny-int8
Evaluation results
- Model Size (MB)self-reported152.000
- Compression Ratioself-reported2.000
- Original Size (MB)self-reported298.000