Zhen Ye Claude Opus 4.6 commited on
Commit
b17bd6d
·
1 Parent(s): 5aec47c

perf: enable torch.compile for SAM2 via vos_optimized flag

Browse files

Uses Facebook's official SAM2VideoPredictorVOS which compiles all five
model components (image_encoder, memory_encoder, memory_attention,
sam_prompt_encoder, sam_mask_decoder) with torch.compile max-autotune.
First inference has ~30s warmup cost; subsequent frames benefit from
fused kernels and reduced memory round-trips.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. models/segmenters/grounded_sam2.py +4 -1
models/segmenters/grounded_sam2.py CHANGED
@@ -362,8 +362,11 @@ class GroundedSAM2Segmenter(Segmenter):
362
  from sam2.sam2_image_predictor import SAM2ImagePredictor
363
 
364
  # Video predictor (for process_video)
 
 
 
365
  self._video_predictor = build_sam2_video_predictor_hf(
366
- hf_id, device=self.device
367
  )
368
 
369
  # Image predictor (for single-frame predict)
 
362
  from sam2.sam2_image_predictor import SAM2ImagePredictor
363
 
364
  # Video predictor (for process_video)
365
+ # vos_optimized=True enables SAM2VideoPredictorVOS which compiles
366
+ # image_encoder, memory_encoder, memory_attention, sam_prompt_encoder,
367
+ # and sam_mask_decoder with torch.compile(mode="max-autotune").
368
  self._video_predictor = build_sam2_video_predictor_hf(
369
+ hf_id, device=self.device, vos_optimized=True,
370
  )
371
 
372
  # Image predictor (for single-frame predict)