SAM 2.1 Hiera-Small β€” INT8 Quantized

Meta's Segment Anything Model 2.1 (Hiera-Small backbone) quantized to INT8 for real-time robotic segmentation. 1.9x smaller β€” from 352 MB to 186 MB β€” with both image and video segmentation capabilities preserved.

This model is part of the RobotFlowLabs model library, built for the ANIMA agentic robotics platform β€” a modular ROS2-native AI system that brings foundation model intelligence to real robots operating in the real world.

Why This Model Exists

SAM2 is the state-of-the-art for promptable segmentation β€” given a point, box, or mask prompt, it segments any object in images or tracks it through video. The Hiera-Small variant is the production sweet spot: fast enough for real-time robotics, accurate enough for manipulation tasks, and at 186 MB it leaves room for depth estimation, feature extraction, and action models on the same edge GPU.

Model Details

Property Value
Architecture Hiera-Small vision backbone + SAM2 decoder
Input Resolution 1024 Γ— 1024
Capabilities Image segmentation, video object tracking
Backbone Stages 4 stages: [1, 2, 11, 2] blocks
Embed Dims [96, 192, 384, 768] per stage
Attention Heads [1, 2, 4, 8] per stage
Global Attention Blocks 7, 10, 13
Mask Decoder 256-dim hidden, 8 attention heads, 3 multi-mask outputs
Memory Attention 4 layers, 2048-dim FFN, RoPE positional encoding
Memory Bank 7 frames temporal context
Original Model facebook/sam2.1-hiera-small
License Apache-2.0

Compression Results

Quantized on an NVIDIA L4 24GB GPU using INT8 dynamic quantization with SafeTensors export.

Metric Original INT8 Quantized Change
Total Size 352 MB 186 MB 1.9x smaller
INT8 Weights β€” 39 MB Quantized linear layers
SafeTensors β€” 148 MB Full model weights
Quantization FP32 INT8 Dynamic Per-tensor symmetric
Format PyTorch SafeTensors + INT8 .pt Dual format

Why SafeTensors instead of ONNX? SAM2 uses custom CUDA operations (roi_align, deformable attention) that aren't supported by the ONNX standard. SafeTensors provides fast, safe loading directly into PyTorch with zero-copy memory mapping.

Included Files

sam2.1-hiera-small-int8/
β”œβ”€β”€ model_int8.pt              # 39 MB β€” INT8 quantized state dict
β”œβ”€β”€ model.safetensors          # 148 MB β€” Full model in SafeTensors format
β”œβ”€β”€ config.json                # Model configuration
β”œβ”€β”€ preprocessor_config.json   # Image preprocessing config
└── README.md                  # This file

Quick Start

PyTorch (SafeTensors)

from transformers import Sam2Model, Sam2Processor
import torch

# Load with SafeTensors (automatic)
model = Sam2Model.from_pretrained("robotflowlabs/sam2.1-hiera-small-int8")
processor = Sam2Processor.from_pretrained("facebook/sam2.1-hiera-small")

model.to("cuda").eval()

# Segment with point prompt
inputs = processor(
    images=image,
    input_points=[[[500, 375]]],  # (x, y) point prompt
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    outputs = model(**inputs)

masks = processor.post_process_masks(
    outputs.pred_masks,
    inputs["original_sizes"],
    inputs["reshaped_input_sizes"]
)

INT8 Weights (Maximum Compression)

import torch
from transformers import Sam2Model

# Load architecture, then apply INT8 weights
model = Sam2Model.from_pretrained("facebook/sam2.1-hiera-small")
int8_state = torch.load("model_int8.pt", map_location="cuda", weights_only=True)
model.load_state_dict(int8_state, strict=False)

With FORGE (ANIMA Integration)

from forge.vision import VisionEncoderRegistry

# FORGE handles optimal loading and batching
segmenter = VisionEncoderRegistry.load("sam2.1-hiera-small-int8")
masks = segmenter.segment(image, points=[[500, 375]])

Use Cases in ANIMA

SAM2-Small is the default segmentation backbone for production ANIMA deployments:

  • Object Isolation β€” Segment graspable objects from cluttered scenes for manipulation planning
  • Workspace Mapping β€” Identify free space, obstacles, and surfaces for navigation
  • Video Tracking β€” Track objects across frames during manipulation sequences (7-frame temporal memory)
  • Safety Zones β€” Segment human body parts and keep-out regions for safe human-robot collaboration
  • Bin Picking β€” Segment individual parts from a bin for industrial pick-and-place

SAM2 Model Family

We provide all three SAM2.1 variants, optimized for different deployment scenarios:

Model Size Speed Best For
sam2.1-hiera-large-int8 1.0 GB Highest quality Research, high-accuracy tasks
sam2.1-hiera-small-int8 186 MB Balanced Production robotics
sam2.1-hiera-tiny-int8 152 MB Fastest Real-time edge, Jetson Nano

Intended Use

Designed For

  • Promptable segmentation in robotic manipulation pipelines
  • Video object tracking during multi-step tasks
  • Instance segmentation for bin picking and object isolation
  • Real-time scene parsing on edge GPUs (Jetson Orin, L4)

Limitations

  • INT8 quantization may slightly reduce mask boundary precision on very fine structures
  • Video tracking requires sequential frame processing (not parallelizable)
  • Requires a prompt (point, box, or mask) β€” not a panoptic segmenter
  • Inherits biases from SA-V dataset (primarily indoor/outdoor natural scenes)

Out of Scope

  • Medical image segmentation without domain-specific validation
  • Autonomous driving perception (not trained on driving data)
  • Surveillance or tracking of individuals

Technical Details

Compression Pipeline

Original SAM2.1 Hiera-Small (FP32, 352 MB)
    β”‚
    β”œβ”€β†’ torchao INT8 dynamic quantization (GPU-native)
    β”‚   └─→ model_int8.pt (39 MB)
    β”‚
    └─→ SafeTensors export (roi_align not ONNX-compatible)
        └─→ model.safetensors (148 MB)
  • Quantization: INT8 dynamic activation + INT8 weight via torchao on NVIDIA L4 GPU
  • Export: SafeTensors format β€” zero-copy memory mapping, fast loading, framework-agnostic
  • Why not ONNX: SAM2's roi_align and deformable attention are custom CUDA ops that ONNX opset 18 cannot represent
  • Hardware: NVIDIA L4 24GB, CUDA 13.0, PyTorch 2.10, Python 3.14

Attribution

Citation

@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolber, Chloe and Gustafson, Laura and others},
  journal={arXiv preprint arXiv:2408.00714},
  year={2024}
}
@misc{robotflowlabs2026anima,
  title={ANIMA: Agentic Networked Intelligence for Modular Autonomy},
  author={RobotFlowLabs},
  year={2026},
  url={https://huggingface.co/robotflowlabs}
}

Built with FORGE by RobotFlowLabs
Optimizing foundation models for real robots.

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for robotflowlabs/sam2.1-hiera-small-int8

Finetuned
(3)
this model

Collection including robotflowlabs/sam2.1-hiera-small-int8

Paper for robotflowlabs/sam2.1-hiera-small-int8

Evaluation results