SAM 3.1

SAM 3 (Segment Anything with Concepts) is a unified foundation model from Meta for promptable segmentation in images and videos. It detects, segments, and tracks objects using text or visual prompts such as points, boxes, and masks. SAM 3 introduces the ability to exhaustively segment all instances of an open-vocabulary concept specified by a short text phrase, handling over 50x more unique concepts than existing benchmarks. SAM 3.1 builds on this with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers ~7x faster inference at 128 objects on a single H100 GPU without sacrificing accuracy, along with improved VOS performance on 6 out of 7 benchmarks.

This repository hosts only the SAM 3.1 model checkpoints — there is no Hugging Face Transformers integration. For installation, code, usage examples, and full documentation, please visit the SAM 3 GitHub repository.

Downloads last month: 8

Inference Providers NEW

Mask Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support