Spaces:
Paused
Paused
Zhen Ye Claude Opus 4.6 commited on
Commit ·
b17bd6d
1
Parent(s): 5aec47c
perf: enable torch.compile for SAM2 via vos_optimized flag
Browse filesUses Facebook's official SAM2VideoPredictorVOS which compiles all five
model components (image_encoder, memory_encoder, memory_attention,
sam_prompt_encoder, sam_mask_decoder) with torch.compile max-autotune.
First inference has ~30s warmup cost; subsequent frames benefit from
fused kernels and reduced memory round-trips.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
models/segmenters/grounded_sam2.py
CHANGED
|
@@ -362,8 +362,11 @@ class GroundedSAM2Segmenter(Segmenter):
|
|
| 362 |
from sam2.sam2_image_predictor import SAM2ImagePredictor
|
| 363 |
|
| 364 |
# Video predictor (for process_video)
|
|
|
|
|
|
|
|
|
|
| 365 |
self._video_predictor = build_sam2_video_predictor_hf(
|
| 366 |
-
hf_id, device=self.device
|
| 367 |
)
|
| 368 |
|
| 369 |
# Image predictor (for single-frame predict)
|
|
|
|
| 362 |
from sam2.sam2_image_predictor import SAM2ImagePredictor
|
| 363 |
|
| 364 |
# Video predictor (for process_video)
|
| 365 |
+
# vos_optimized=True enables SAM2VideoPredictorVOS which compiles
|
| 366 |
+
# image_encoder, memory_encoder, memory_attention, sam_prompt_encoder,
|
| 367 |
+
# and sam_mask_decoder with torch.compile(mode="max-autotune").
|
| 368 |
self._video_predictor = build_sam2_video_predictor_hf(
|
| 369 |
+
hf_id, device=self.device, vos_optimized=True,
|
| 370 |
)
|
| 371 |
|
| 372 |
# Image predictor (for single-frame predict)
|