ISR

Sleeping

Zhen Ye Claude Opus 4.6 (1M context) commited on 20 days ago

Commit

8d09cca

1 Parent(s): ae40f9a

fix: add mask-level NMS in GSAM2/YSAM2 to deduplicate overlapping masks

Within a single keyframe, YOLO can detect the same object with
slightly different bounding boxes (e.g., cab vs full truck body)
that survive box-level NMS but produce overlapping SAM2 masks.
These all get unique IDs and render as stacked labels.

Added _mask_nms() to MaskDictionary.add_new_frame_annotation():
- Computes pairwise mask IoU for same-label detections
- Suppresses smaller masks when IoU > 0.5 with a larger one
- Runs before masks enter the SAM2 video predictor pipeline

Fixes duplicate "truck truck truck" labels on single objects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

models/segmenters/grounded_sam2.py +52 -0

models/segmenters/grounded_sam2.py CHANGED Viewed

@@ -50,7 +50,16 @@ class MaskDictionary:
         mask_list: torch.Tensor,
         box_list: torch.Tensor,
         label_list: list,
     ):
         mask_img = torch.zeros(mask_list.shape[-2:])
         anno = {}
         for idx, (mask, box, label) in enumerate(zip(mask_list, box_list, label_list)):
@@ -69,6 +78,49 @@ class MaskDictionary:
         self.mask_width = mask_img.shape[1]
         self.labels = anno
     def update_masks(
         self,
         tracking_dict: "MaskDictionary",

         mask_list: torch.Tensor,
         box_list: torch.Tensor,
         label_list: list,
+        mask_iou_threshold: float = 0.5,
     ):
+        # Deduplicate overlapping masks within the same keyframe.
+        # YOLO can detect the same object with slightly different boxes
+        # (e.g., cab vs full truck), producing multiple masks for one object.
+        keep = self._mask_nms(mask_list, box_list, label_list, mask_iou_threshold)
+        mask_list = mask_list[keep]
+        box_list = box_list[keep]
+        label_list = [label_list[i] for i in keep]
         mask_img = torch.zeros(mask_list.shape[-2:])
         anno = {}
         for idx, (mask, box, label) in enumerate(zip(mask_list, box_list, label_list)):
         self.mask_width = mask_img.shape[1]
         self.labels = anno
+    @staticmethod
+    def _mask_nms(
+        masks: torch.Tensor,
+        boxes: torch.Tensor,
+        labels: list,
+        iou_threshold: float = 0.5,
+    ) -> list:
+        """Remove duplicate masks within a keyframe using mask IoU.
+        For each pair of masks with the same label, if their mask IoU
+        exceeds the threshold, keep the one with the larger area.
+        Returns indices to keep.
+        """
+        n = len(masks)
+        if n <= 1:
+            return list(range(n))
+        # Compute mask areas
+        areas = [int(masks[i].sum()) for i in range(n)]
+        suppressed = [False] * n
+        # Sort by area descending (keep larger masks)
+        order = sorted(range(n), key=lambda i: areas[i], reverse=True)
+        keep = []
+        for i in order:
+            if suppressed[i]:
+                continue
+            keep.append(i)
+            for j in order:
+                if j <= i or suppressed[j]:
+                    continue
+                # Only suppress same-label masks
+                if labels[i] != labels[j]:
+                    continue
+                # Compute mask IoU
+                inter = int((masks[i] & masks[j]).sum())
+                union = areas[i] + areas[j] - inter
+                if union > 0 and inter / union > iou_threshold:
+                    suppressed[j] = True
+        return sorted(keep)
     def update_masks(
         self,
         tracking_dict: "MaskDictionary",