SAM 2.1 Hiera-Small β INT8 Quantized
Meta's Segment Anything Model 2.1 (Hiera-Small backbone) quantized to INT8 for real-time robotic segmentation. 1.9x smaller β from 352 MB to 186 MB β with both image and video segmentation capabilities preserved.
This model is part of the RobotFlowLabs model library, built for the ANIMA agentic robotics platform β a modular ROS2-native AI system that brings foundation model intelligence to real robots operating in the real world.
Why This Model Exists
SAM2 is the state-of-the-art for promptable segmentation β given a point, box, or mask prompt, it segments any object in images or tracks it through video. The Hiera-Small variant is the production sweet spot: fast enough for real-time robotics, accurate enough for manipulation tasks, and at 186 MB it leaves room for depth estimation, feature extraction, and action models on the same edge GPU.
Model Details
| Property | Value |
|---|---|
| Architecture | Hiera-Small vision backbone + SAM2 decoder |
| Input Resolution | 1024 Γ 1024 |
| Capabilities | Image segmentation, video object tracking |
| Backbone Stages | 4 stages: [1, 2, 11, 2] blocks |
| Embed Dims | [96, 192, 384, 768] per stage |
| Attention Heads | [1, 2, 4, 8] per stage |
| Global Attention | Blocks 7, 10, 13 |
| Mask Decoder | 256-dim hidden, 8 attention heads, 3 multi-mask outputs |
| Memory Attention | 4 layers, 2048-dim FFN, RoPE positional encoding |
| Memory Bank | 7 frames temporal context |
| Original Model | facebook/sam2.1-hiera-small |
| License | Apache-2.0 |
Compression Results
Quantized on an NVIDIA L4 24GB GPU using INT8 dynamic quantization with SafeTensors export.
| Metric | Original | INT8 Quantized | Change |
|---|---|---|---|
| Total Size | 352 MB | 186 MB | 1.9x smaller |
| INT8 Weights | β | 39 MB | Quantized linear layers |
| SafeTensors | β | 148 MB | Full model weights |
| Quantization | FP32 | INT8 Dynamic | Per-tensor symmetric |
| Format | PyTorch | SafeTensors + INT8 .pt | Dual format |
Why SafeTensors instead of ONNX? SAM2 uses custom CUDA operations (roi_align, deformable attention) that aren't supported by the ONNX standard. SafeTensors provides fast, safe loading directly into PyTorch with zero-copy memory mapping.
Included Files
sam2.1-hiera-small-int8/
βββ model_int8.pt # 39 MB β INT8 quantized state dict
βββ model.safetensors # 148 MB β Full model in SafeTensors format
βββ config.json # Model configuration
βββ preprocessor_config.json # Image preprocessing config
βββ README.md # This file
Quick Start
PyTorch (SafeTensors)
from transformers import Sam2Model, Sam2Processor
import torch
# Load with SafeTensors (automatic)
model = Sam2Model.from_pretrained("robotflowlabs/sam2.1-hiera-small-int8")
processor = Sam2Processor.from_pretrained("facebook/sam2.1-hiera-small")
model.to("cuda").eval()
# Segment with point prompt
inputs = processor(
images=image,
input_points=[[[500, 375]]], # (x, y) point prompt
return_tensors="pt"
).to("cuda")
with torch.no_grad():
outputs = model(**inputs)
masks = processor.post_process_masks(
outputs.pred_masks,
inputs["original_sizes"],
inputs["reshaped_input_sizes"]
)
INT8 Weights (Maximum Compression)
import torch
from transformers import Sam2Model
# Load architecture, then apply INT8 weights
model = Sam2Model.from_pretrained("facebook/sam2.1-hiera-small")
int8_state = torch.load("model_int8.pt", map_location="cuda", weights_only=True)
model.load_state_dict(int8_state, strict=False)
With FORGE (ANIMA Integration)
from forge.vision import VisionEncoderRegistry
# FORGE handles optimal loading and batching
segmenter = VisionEncoderRegistry.load("sam2.1-hiera-small-int8")
masks = segmenter.segment(image, points=[[500, 375]])
Use Cases in ANIMA
SAM2-Small is the default segmentation backbone for production ANIMA deployments:
- Object Isolation β Segment graspable objects from cluttered scenes for manipulation planning
- Workspace Mapping β Identify free space, obstacles, and surfaces for navigation
- Video Tracking β Track objects across frames during manipulation sequences (7-frame temporal memory)
- Safety Zones β Segment human body parts and keep-out regions for safe human-robot collaboration
- Bin Picking β Segment individual parts from a bin for industrial pick-and-place
SAM2 Model Family
We provide all three SAM2.1 variants, optimized for different deployment scenarios:
| Model | Size | Speed | Best For |
|---|---|---|---|
| sam2.1-hiera-large-int8 | 1.0 GB | Highest quality | Research, high-accuracy tasks |
| sam2.1-hiera-small-int8 | 186 MB | Balanced | Production robotics |
| sam2.1-hiera-tiny-int8 | 152 MB | Fastest | Real-time edge, Jetson Nano |
Intended Use
Designed For
- Promptable segmentation in robotic manipulation pipelines
- Video object tracking during multi-step tasks
- Instance segmentation for bin picking and object isolation
- Real-time scene parsing on edge GPUs (Jetson Orin, L4)
Limitations
- INT8 quantization may slightly reduce mask boundary precision on very fine structures
- Video tracking requires sequential frame processing (not parallelizable)
- Requires a prompt (point, box, or mask) β not a panoptic segmenter
- Inherits biases from SA-V dataset (primarily indoor/outdoor natural scenes)
Out of Scope
- Medical image segmentation without domain-specific validation
- Autonomous driving perception (not trained on driving data)
- Surveillance or tracking of individuals
Technical Details
Compression Pipeline
Original SAM2.1 Hiera-Small (FP32, 352 MB)
β
βββ torchao INT8 dynamic quantization (GPU-native)
β βββ model_int8.pt (39 MB)
β
βββ SafeTensors export (roi_align not ONNX-compatible)
βββ model.safetensors (148 MB)
- Quantization: INT8 dynamic activation + INT8 weight via
torchaoon NVIDIA L4 GPU - Export: SafeTensors format β zero-copy memory mapping, fast loading, framework-agnostic
- Why not ONNX: SAM2's roi_align and deformable attention are custom CUDA ops that ONNX opset 18 cannot represent
- Hardware: NVIDIA L4 24GB, CUDA 13.0, PyTorch 2.10, Python 3.14
Attribution
- Original Model:
facebook/sam2.1-hiera-smallby Meta AI (FAIR) - License: Apache-2.0 β free for commercial and research use
- Paper: SAM 2: Segment Anything in Images and Videos β Ravi et al., 2024
- Dataset: SA-V β 50.9K videos, 642.6K masklets
- Compressed by: RobotFlowLabs using FORGE
Citation
@article{ravi2024sam2,
title={SAM 2: Segment Anything in Images and Videos},
author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolber, Chloe and Gustafson, Laura and others},
journal={arXiv preprint arXiv:2408.00714},
year={2024}
}
@misc{robotflowlabs2026anima,
title={ANIMA: Agentic Networked Intelligence for Modular Autonomy},
author={RobotFlowLabs},
year={2026},
url={https://huggingface.co/robotflowlabs}
}
Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.
- Downloads last month
- 11
Model tree for robotflowlabs/sam2.1-hiera-small-int8
Base model
facebook/sam2.1-hiera-smallCollection including robotflowlabs/sam2.1-hiera-small-int8
Paper for robotflowlabs/sam2.1-hiera-small-int8
Evaluation results
- Model Size (MB)self-reported186.000
- Compression Ratioself-reported1.900
- Original Size (MB)self-reported352.000