Instructions to use facebook/sam3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/sam3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("mask-generation", model="facebook/sam3")# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("facebook/sam3") model = AutoModel.from_pretrained("facebook/sam3") - Notebooks
- Google Colab
- Kaggle
SAM 3 OOM Behaviour on High-Density Scenes
oint for Discussion: SAM 3 OOM Behavior on High-Density Scenes
We are evaluating SAM 3 on aerial/drone imagery where a single frame can contain a very large number of small objects (example attached: hundreds of birds over water).
Even on NVIDIA A10G GPUs (24 GB VRAM), inference can hit CUDA OOM when the image contains a large number of candidate detections or mask proposals. In our testing, object density appears to be a stronger contributor to memory consumption than image resolution alone, particularly in aerial imagery containing hundreds of small objects.
A few questions for the community:
Is there any built-in mechanism in SAM 3 to gracefully handle extremely dense scenes?
Are there recommended settings to limit mask proposals or memory growth during inference?
Has anyone implemented dynamic fallback strategies such as:
reducing image resolution,
tiled inference,
limiting masks per frame,
chunked/streamed mask generation,
CPU offloading for intermediate tensors?
Are there known memory optimizations for dense-object aerial imagery specifically?
Is there a way to estimate expected memory usage before mask generation begins so inference can adapt proactively?
Our goal is to avoid CUDA OOMs on images containing hundreds of objects while maintaining acceptable recall.
Any guidance, benchmarks, or recommended inference configurations would be appreciated.