SAM 3 OOM Behaviour on High-Density Scenes

#301
by Dp1412 - opened

oint for Discussion: SAM 3 OOM Behavior on High-Density Scenes

We are evaluating SAM 3 on aerial/drone imagery where a single frame can contain a very large number of small objects (example attached: hundreds of birds over water).

Even on NVIDIA A10G GPUs (24 GB VRAM), inference can hit CUDA OOM when the image contains a large number of candidate detections or mask proposals. In our testing, object density appears to be a stronger contributor to memory consumption than image resolution alone, particularly in aerial imagery containing hundreds of small objects.
pexels-sujin-appu-2149064933-30597728

A few questions for the community:

Is there any built-in mechanism in SAM 3 to gracefully handle extremely dense scenes?
Are there recommended settings to limit mask proposals or memory growth during inference?
Has anyone implemented dynamic fallback strategies such as:
reducing image resolution,
tiled inference,
limiting masks per frame,
chunked/streamed mask generation,
CPU offloading for intermediate tensors?
Are there known memory optimizations for dense-object aerial imagery specifically?
Is there a way to estimate expected memory usage before mask generation begins so inference can adapt proactively?

Our goal is to avoid CUDA OOMs on images containing hundreds of objects while maintaining acceptable recall.

Any guidance, benchmarks, or recommended inference configurations would be appreciated.

Sign up or log in to comment