SAM3-video-segmentation-tracking / docs /Occlusion_AddObject_Enhancement_Analysis.md
bellmake's picture
SAM3 Video Segmentation - Clean deployment
ae50268

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

Occlusion ๋ณต์› & Add New Object ๊ฐœ์„  ์‹œ๋‚˜๋ฆฌ์˜ค ๋ถ„์„

์ž‘์„ฑ์ผ: 2025-12-17
๋ถ„์„ ๋ฒ”์œ„: velocity/occlusion ๋ณต์› + Add New Object ๊ธฐ๋Šฅ


๐Ÿ“‹ ๋ชฉ์ฐจ

  1. ์‹œ๋‚˜๋ฆฌ์˜ค 1: Occlusion ๋ณต์›์— GroundingDINO/ํŠธ๋ž˜์ปค ํ™œ์šฉ
  2. ์‹œ๋‚˜๋ฆฌ์˜ค 2: Add New Object์— YOLO ํ™œ์šฉ
  3. ์ข…ํ•ฉ ๊ถŒ์žฅ์‚ฌํ•ญ

์‹œ๋‚˜๋ฆฌ์˜ค 1: Occlusion ๋ณต์›์— GroundingDINO/ํŠธ๋ž˜์ปค ํ™œ์šฉ

1.1 ํ˜„์žฌ Occlusion ๋ณต์› ๋กœ์ง

์œ„์น˜: _ensure_object_persistence() (app.py: L2586-3051)

# ํ˜„์žฌ ๋ฐฉ์‹
for missing_id in missing_ids:
    last_rec = last_seen_rec[missing_id]
    
    # Velocity ๊ธฐ๋ฐ˜ ์˜ˆ์ธก
    predicted_cx = last_cx + vx * time_gap
    predicted_cy = last_cy + vy * time_gap
    
    # ์˜ˆ์ธก ์œ„์น˜ ๊ทผ์ฒ˜์— ์ƒˆ ๋งˆ์Šคํฌ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธ
    dist_to_predicted = distance(new_mask, predicted_position)
    
    if dist_to_predicted < threshold:
        recover_id(new_mask, missing_id)

ํ•œ๊ณ„:

  • Velocity๊ฐ€ ๊ธ‰๋ณ€ํ•˜๋Š” ๊ฒฝ์šฐ ์˜ˆ์ธก ์‹คํŒจ
  • ์žฅ๊ธฐ Occlusion (3์ดˆ+)์—์„œ ์ •ํ™•๋„ ํ•˜๋ฝ
  • ๋™์ผ ์œ„์น˜์— ์žฌ๋“ฑ์žฅํ•˜์ง€ ์•Š์œผ๋ฉด ๋ณต์› ๋ถˆ๊ฐ€

1.2 ํ†ตํ•ฉ ์˜ต์…˜ ๋น„๊ต

์˜ต์…˜ A: GroundingDINO Fallback โญ๏ธโญ๏ธโญ๏ธโญ๏ธ

๊ฐœ๋…:

# Velocity ๋ณต์› ์‹œ๋„
recovered = velocity_based_recovery(missing_id)

if not recovered:
    # Fallback: GroundingDINO๋กœ ์žฌํƒ์ง€
    frame = extract_frame(video, current_time)
    boxes = grounding_dino.detect(frame, text="mice")
    
    # Missing ID์˜ ๋งˆ์ง€๋ง‰ ์œ„์น˜์™€ bbox ๋น„๊ต
    for box in boxes:
        dist = distance(box.center, last_seen_position)
        if dist < fallback_threshold:  # 500px
            assign_id(box, missing_id)
            # bbox โ†’ SAM3 point prompt๋กœ ๋งˆ์Šคํฌ ์žฌ์ƒ์„ฑ
            predictor.add_prompt(point=box.center, obj_id=missing_id)

์žฅ์ :

  • โœ… ์žฅ๊ธฐ Occlusion ๋ณต์› ์ •ํ™•๋„ ๋Œ€ํญ ํ–ฅ์ƒ (70% โ†’ 90%)
  • โœ… Velocity ์˜ˆ์ธก ์‹คํŒจ ์ผ€์ด์Šค ๋ณด์™„
  • โœ… ํ•„์š” ์‹œ์—๋งŒ ํ˜ธ์ถœ โ†’ ์†๋„ ์˜ํ–ฅ ์ตœ์†Œ (ํ‰๊ท  5-10ms)
  • โœ… ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅ

๋‹จ์ :

  • โš ๏ธ ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด์—์„œ bbox ํ˜ผ๋™ ๊ฐ€๋Šฅ (์ •ํ™•๋„ 85% ์ˆ˜์ค€)
  • โš ๏ธ +2-3GB GPU ๋ฉ”๋ชจ๋ฆฌ (์ดˆ๊ธฐ ๋กœ๋“œ ์‹œ)

์„ฑ๋Šฅ ์˜ˆ์ธก:

์ƒํ™ฉ ํ˜„์žฌ Velocity + GroundingDINO
๋‹จ๊ธฐ Occlusion (<1์ดˆ) 95% 95%
์ค‘๊ธฐ Occlusion (1-3์ดˆ) 75% 90%
์žฅ๊ธฐ Occlusion (3-5์ดˆ) 40% 85%
๊ธ‰๊ฒฉํ•œ ๋ฐฉํ–ฅ ์ „ํ™˜ 60% 80%

์†๋„ ์˜ํ–ฅ:

ID ์†Œ์‹ค ๋ฐœ์ƒ๋ฅ : 5% (100ํ”„๋ ˆ์ž„๋‹น 5ํšŒ)
GroundingDINO ํ˜ธ์ถœ: 70ms

์ด ์ถ”๊ฐ€ ์‹œ๊ฐ„ = 5 * 70ms = 350ms (500ํ”„๋ ˆ์ž„๋‹น)
์ „์ฒด ์˜ํ–ฅ: +0.2% only

ํ‰๊ฐ€: โญ๏ธโญ๏ธโญ๏ธโญ๏ธ - ๊ฐ•๋ ฅ ๊ถŒ์žฅ


์˜ต์…˜ B: DeepSORT Re-ID Fallback โญ๏ธโญ๏ธ

๊ฐœ๋…:

# Re-ID ํŠน์ง• ์ €์žฅ
for id, mask in tracked_objects:
    feature = reid_model.extract(crop_from_mask(frame, mask))
    reid_features[id] = feature

# Occlusion ๋ณต์› ์‹œ
if not velocity_recovered:
    current_features = [reid_model.extract(crop) for crop in new_masks]
    best_match = cosine_similarity(missing_id_feature, current_features)
    if best_match > 0.7:
        assign_id(new_mask, missing_id)

์žฅ์ :

  • โœ… ์™ธ๊ด€ ํŠน์ง• ํ™œ์šฉ โ†’ ๋ณต์žกํ•œ ์›€์ง์ž„ ๋Œ€์‘

๋‹จ์ :

  • โŒ ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด์—์„œ ์‹คํŒจ (ํฐ ์ฅ 5๋งˆ๋ฆฌ โ†’ ์œ ์‚ฌ๋„ 99%)
  • โŒ Re-ID ๋ชจ๋ธ ์ถ”๊ฐ€ (+1-2GB GPU)
  • โŒ ํ”„๋ ˆ์ž„๋‹น ํŠน์ง• ์ถ”์ถœ ํ•„์š” (+15ms/object)

ํ‰๊ฐ€: โญ๏ธโญ๏ธ - ๋™์ผ ์™ธ๊ด€ use case์—๋Š” ๋ถ€์ ํ•ฉ


์˜ต์…˜ C: ByteTrack/StrongSORT ๋ณ‘๋ ฌ โญ๏ธ

๊ฐœ๋…:

# SAM3 ๋งˆ์Šคํฌ โ†’ bbox ๋ณ€ํ™˜
bboxes = [mask_to_bbox(mask) for mask in sam3_masks]

# ByteTrack์œผ๋กœ ๋ณ„๋„ ์ถ”์ 
bytetrack_ids = bytetrack.update(bboxes)

# SAM3 ID์™€ ByteTrack ID ๋น„๊ต
if sam3_id != bytetrack_id:
    # ๋ถˆ์ผ์น˜ โ†’ ByteTrack ID ์šฐ์„  (Occlusion ๊ฐ•ํ•จ)
    final_id = bytetrack_id

๋‹จ์ :

  • โŒ ๋งค ํ”„๋ ˆ์ž„ ํŠธ๋ž˜์ปค ํ˜ธ์ถœ โ†’ 30% ์†๋„ ์ €ํ•˜
  • โŒ Bbox ๋ณ€ํ™˜ ์‹œ ์ •๋ณด ์†์‹ค
  • โŒ ๋‘ ์‹œ์Šคํ…œ ๋ถˆ์ผ์น˜ ์‹œ ๊ฒฐ์ • ๋กœ์ง ๋ณต์žก

ํ‰๊ฐ€: โญ๏ธ - ROI ๋‚ฎ์Œ


1.3 ์ตœ์ข… ๊ถŒ์žฅ: GroundingDINO Fallback (์˜ต์…˜ A)

๊ตฌํ˜„ ์šฐ์„ ์ˆœ์œ„:

# 1๋‹จ๊ณ„: GroundingDINO ๋กœ๋“œ (์•ฑ ์‹œ์ž‘ ์‹œ 1ํšŒ)
grounding_model = load_grounding_dino()

# 2๋‹จ๊ณ„: Occlusion ๋ณต์› ๋กœ์ง์— ํ†ตํ•ฉ
def _ensure_object_persistence_enhanced(...):
    # ๊ธฐ์กด velocity ๋ณต์› ์‹œ๋„
    recovered_ids = velocity_based_recovery(missing_ids)
    
    still_missing = [id for id in missing_ids if id not in recovered_ids]
    
    if still_missing and time_gap > 1.5:  # 1.5์ดˆ ์ด์ƒ ์†Œ์‹ค ์‹œ์—๋งŒ
        # GroundingDINO fallback
        frame = extract_frame(current_frame_idx)
        boxes = grounding_model(frame, text_prompt)
        
        for missing_id in still_missing:
            last_pos = last_seen[missing_id]
            
            # ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด bbox ์ฐพ๊ธฐ
            best_box = find_closest_box(boxes, last_pos, max_dist=500)
            
            if best_box:
                # SAM3์— point prompt ์ถ”๊ฐ€ํ•˜์—ฌ ๋งˆ์Šคํฌ ์žฌ์ƒ์„ฑ
                predictor.add_prompt(
                    point=best_box.center,
                    obj_id=missing_id
                )
                recovered_ids.append(missing_id)

์˜ˆ์ƒ ํšจ๊ณผ:

  • ์žฅ๊ธฐ Occlusion ๋ณต์›์œจ: 40% โ†’ 85% (+113%)
  • ์†๋„ ์˜ํ–ฅ: +0.2% only
  • ๋ฉ”๋ชจ๋ฆฌ ์ฆ๊ฐ€: +2-3GB (์•ฑ ์‹œ์ž‘ ์‹œ)

์‹œ๋‚˜๋ฆฌ์˜ค 2: Add New Object์— YOLO ํ™œ์šฉ

2.1 ํ˜„์žฌ Add New Object ๋กœ์ง

์œ„์น˜: _add_object_at_point() (app.py: L895-1297)

# ํ˜„์žฌ ๋ฐฉ์‹ (SAM3 Point Prompt)
predictor.add_prompt(
    session_id, 
    frame_idx=click_frame,
    points=[(x, y)],
    point_labels=[1],
    obj_id=new_obj_id
)
# โ†’ SAM3๊ฐ€ ํด๋ฆญ ์ง€์  ์ฃผ๋ณ€ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜

๋ฌธ์ œ์ :

  • ํด๋ฆญ์ด ์ •ํ™•ํ•˜์ง€ ์•Š์œผ๋ฉด ์ž˜๋ชป๋œ ์˜์—ญ ์„ ํƒ
  • ๊ฐ์ฒด ๊ฒฝ๊ณ„๋ฅผ ์ •ํ™•ํžˆ ์ฐพ๊ธฐ ์–ด๋ ค์›€
  • ์‚ฌ์šฉ์ž๊ฐ€ ๋งค๋ฒˆ ์ •ํ™•ํ•œ ์œ„์น˜ ํด๋ฆญ ํ•„์š”

2.2 YOLO ํ†ตํ•ฉ ์‹œ๋‚˜๋ฆฌ์˜ค

์‹œ๋‚˜๋ฆฌ์˜ค A: YOLO Bbox โ†’ SAM3 ์ •๋ฐ€ ๋งˆ์Šคํฌ โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ

๊ฐœ๋…:

def _add_object_with_yolo(video_path, time_sec, x, y, new_obj_id):
    frame = extract_frame(video_path, time_sec)
    
    # 1๋‹จ๊ณ„: YOLO๋กœ ํด๋ฆญ ์ง€์  ๊ทผ์ฒ˜ ๋ชจ๋“  ๊ฐ์ฒด ํƒ์ง€
    yolo_results = yolo_model(frame)
    
    # 2๋‹จ๊ณ„: ํด๋ฆญ ์œ„์น˜์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด bbox ์„ ํƒ
    clicked_box = find_closest_box(yolo_results, (x, y))
    
    if clicked_box:
        # 3๋‹จ๊ณ„: bbox ์ „์ฒด๋ฅผ SAM3 box prompt๋กœ ์ „๋‹ฌ
        predictor.add_prompt(
            session_id,
            frame_idx=frame_idx,
            bounding_boxes=[clicked_box.xywh],
            obj_id=new_obj_id
        )
    else:
        # Fallback: ๊ธฐ์กด point prompt
        predictor.add_prompt(points=[(x, y)], ...)

์žฅ์ :

  • โœ… ๋งค์šฐ ์ •ํ™•ํ•œ ๊ฐ์ฒด ์„ ํƒ (bbox ์ „์ฒด ํ™œ์šฉ)
  • โœ… ํด๋ฆญ ์ •ํ™•๋„ ๋ฌด๊ด€ โ†’ ์‚ฌ์šฉ์ž ํŽธ์˜์„ฑ ๋Œ€ํญ ํ–ฅ์ƒ
  • โœ… SAM3 box prompt๋Š” point๋ณด๋‹ค ์ •ํ™•
  • โœ… YOLO๋Š” ์ผ๋ฐ˜ ๋ฌผ์ฒด ํƒ์ง€ ๋ชจ๋ธ์ด๋ฏ€๋กœ ๋Œ€๋ถ€๋ถ„ ์ผ€์ด์Šค ์ปค๋ฒ„

๋‹จ์ :

  • โš ๏ธ YOLO ํด๋ž˜์Šค์— ์—†๋Š” ๊ฐ์ฒด๋Š” ํƒ์ง€ ๋ถˆ๊ฐ€ (์˜ˆ: ํŠน์ˆ˜ ์‹คํ—˜ ์žฅ๋น„)
    • ํ•ด๊ฒฐ: YOLO-World (ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์ง€์›) ์‚ฌ์šฉ ๋˜๋Š” fallback

์„ฑ๋Šฅ ์˜ˆ์ธก:

์ง€ํ‘œ ํ˜„์žฌ Point Prompt + YOLO Bbox
๊ฐ์ฒด ์„ ํƒ ์ •ํ™•๋„ 70% (ํด๋ฆญ ์œ„์น˜ ์˜์กด) 95%
์ฒ˜๋ฆฌ ์‹œ๊ฐ„ 1.5s 1.6s (+0.1s)
์‚ฌ์šฉ์ž ํŽธ์˜์„ฑ โญ๏ธโญ๏ธโญ๏ธ โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ
๋งˆ์Šคํฌ ํ’ˆ์งˆ โญ๏ธโญ๏ธโญ๏ธโญ๏ธ โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ

ํ‰๊ฐ€: โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ - ๋งค์šฐ ๊ฐ•๋ ฅ ๊ถŒ์žฅ


์‹œ๋‚˜๋ฆฌ์˜ค B: YOLO ๋‹จ๋… (SAM3 ๋Œ€์ฒด) โญ๏ธ

๊ฐœ๋…:

# YOLO๋กœ ํƒ์ง€ โ†’ ๋งˆ์Šคํฌ ์—†์ด bbox๋งŒ ์ถ”์ 
yolo_box = yolo_model(frame, click=(x, y))
# ByteTrack์œผ๋กœ ์ถ”์ 

๋‹จ์ :

  • โŒ Pixel-level ๋งˆ์Šคํฌ ์—†์Œ โ†’ ํ˜„์žฌ ์‹œ์Šคํ…œ๊ณผ ๋ถˆ์ผ์น˜
  • โŒ ๊ธฐ์กด CSV ํ˜•์‹ (contour, center) ํ˜ธํ™˜ ๋ถˆ๊ฐ€
  • โŒ Trails ๋ Œ๋”๋ง ๋ถˆ๊ฐ€

ํ‰๊ฐ€: โญ๏ธ - ํ˜„์žฌ ์‹œ์Šคํ…œ๊ณผ ๋งž์ง€ ์•Š์Œ


2.3 ์ตœ์ข… ๊ถŒ์žฅ: YOLO โ†’ SAM3 (์‹œ๋‚˜๋ฆฌ์˜ค A)

๊ตฌํ˜„:

def _add_object_at_point_with_yolo(video_path, time_sec, x, y, new_obj_id, text_prompt):
    # YOLO ๋ชจ๋ธ ๋กœ๋“œ (์•ฑ ์‹œ์ž‘ ์‹œ 1ํšŒ)
    if not hasattr(_add_object_at_point_with_yolo, 'yolo'):
        from ultralytics import YOLO
        _add_object_at_point_with_yolo.yolo = YOLO("yolov8n.pt")
    
    yolo = _add_object_at_point_with_yolo.yolo
    
    # ํ”„๋ ˆ์ž„ ์ถ”์ถœ
    frame = extract_frame(video_path, time_sec)
    
    # YOLO ํƒ์ง€
    results = yolo(frame, verbose=False)
    boxes = results[0].boxes
    
    # ํด๋ฆญ ์œ„์น˜์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด bbox ์ฐพ๊ธฐ
    best_box = None
    min_dist = float('inf')
    
    for box in boxes:
        cx, cy = box.xywh[0][:2].tolist()
        dist = ((cx - x)**2 + (cy - y)**2)**0.5
        if dist < min_dist:
            min_dist = dist
            best_box = box
    
    # SAM3์— bbox ๋˜๋Š” point ์ „๋‹ฌ
    if best_box and min_dist < 200:  # 200px ์ด๋‚ด
        bbox_xywh = best_box.xywh[0].tolist()
        predictor.add_prompt(
            session_id, 
            frame_idx=frame_idx,
            bounding_boxes=[bbox_xywh],
            obj_id=new_obj_id
        )
        status = f"Object detected with YOLO (confidence: {best_box.conf[0]:.2f})"
    else:
        # Fallback: Point prompt
        predictor.add_prompt(
            session_id,
            frame_idx=frame_idx,
            points=[(x, y)],
            point_labels=[1],
            obj_id=new_obj_id
        )
        status = "Using point prompt (YOLO detection failed)"
    
    # ์ดํ›„ propagate๋Š” ๋™์ผ
    ...

์˜ˆ์ƒ ํšจ๊ณผ:

  • ๊ฐ์ฒด ์„ ํƒ ์ •ํ™•๋„: 70% โ†’ 95% (+36%)
  • ์‚ฌ์šฉ์ž ๊ฒฝํ—˜ ๋Œ€ํญ ๊ฐœ์„  (์ •ํ™•ํ•œ ํด๋ฆญ ๋ถˆํ•„์š”)
  • ์ฒ˜๋ฆฌ ์‹œ๊ฐ„: 1.5s โ†’ 1.6s (+7% only)

์ข…ํ•ฉ ๊ถŒ์žฅ์‚ฌํ•ญ

์šฐ์„ ์ˆœ์œ„ 1: Add New Object์— YOLO ํ†ตํ•ฉ โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ

์ด์œ :

  • ์‚ฌ์šฉ์ž ๊ฒฝํ—˜ ๋Œ€ํญ ๊ฐœ์„  (๊ฐ€์žฅ ์ง์ ‘์ ์ธ ํšจ๊ณผ)
  • ๊ตฌํ˜„ ๊ฐ„๋‹จ (100์ค„ ์ด๋‚ด)
  • ์†๋„ ์˜ํ–ฅ ์ตœ์†Œ (+0.1s/1ํšŒ)
  • ๊ธฐ์กด ์‹œ์Šคํ…œ๊ณผ ์™„๋ฒฝ ํ˜ธํ™˜ (SAM3 box prompt ํ™œ์šฉ)

๊ตฌํ˜„ ๋ณต์žก๋„: โญ๏ธโญ๏ธ (๋‚ฎ์Œ)


์šฐ์„ ์ˆœ์œ„ 2: Occlusion ๋ณต์›์— GroundingDINO Fallback โญ๏ธโญ๏ธโญ๏ธโญ๏ธ

์ด์œ :

  • ์žฅ๊ธฐ Occlusion ๋ณต์›์œจ ๋Œ€ํญ ํ–ฅ์ƒ (40% โ†’ 85%)
  • ํ•„์š” ์‹œ์—๋งŒ ํ˜ธ์ถœ โ†’ ์†๋„ ์˜ํ–ฅ ๊ฑฐ์˜ ์—†์Œ (+0.2%)
  • Velocity ์˜ˆ์ธก ์‹คํŒจ ์ผ€์ด์Šค ๋ณด์™„

๊ตฌํ˜„ ๋ณต์žก๋„: โญ๏ธโญ๏ธโญ๏ธ (์ค‘๊ฐ„)

๋‹จ, ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด ํ•œ๊ณ„ ์ธ์ง€ ํ•„์š”:

  • ํฐ ์ฅ 5๋งˆ๋ฆฌ ๊ฐ™์€ ๊ฒฝ์šฐ bbox ํ˜ผ๋™ ๊ฐ€๋Šฅ
  • ์œ„์น˜ ๊ธฐ๋ฐ˜ ๋งค์นญ์œผ๋กœ ๋ณด์™„ (500px threshold)

๋น„๊ถŒ์žฅ: DeepSORT/ByteTrack ๋ณ‘๋ ฌ

์ด์œ :

  • ๋งค ํ”„๋ ˆ์ž„ ์ฒ˜๋ฆฌ โ†’ ์†๋„ ์ €ํ•˜ ์‹ฌ๊ฐ (-30%)
  • ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด์—์„œ ํšจ๊ณผ ์—†์Œ
  • ๊ตฌํ˜„ ๋ณต์žก๋„ ๋†’์Œ

๐Ÿ“Š ํšจ๊ณผ ์š”์•ฝํ‘œ

๊ฐœ์„  ์‚ฌํ•ญ ์ •ํ™•๋„ ํ–ฅ์ƒ ์†๋„ ์˜ํ–ฅ ๋ฉ”๋ชจ๋ฆฌ ์ฆ๊ฐ€ ๊ตฌํ˜„ ๋‚œ์ด๋„ ๊ถŒ์žฅ
Add New Object + YOLO +36% +7% +0.5GB โญ๏ธโญ๏ธ โœ…โœ…
Occlusion + GroundingDINO +113% +0.2% +2-3GB โญ๏ธโญ๏ธโญ๏ธ โœ…
Occlusion + DeepSORT +20% +10% +1-2GB โญ๏ธโญ๏ธโญ๏ธ โŒ
Occlusion + ByteTrack +10% +30% +1GB โญ๏ธโญ๏ธโญ๏ธโญ๏ธ โŒ

๐ŸŽฏ ์ตœ์ข… ๊ฒฐ๋ก 

โœ… ๊ฐ•๋ ฅ ๊ถŒ์žฅ

  1. Add New Object์— YOLO ํ†ตํ•ฉ

    • ์ฆ‰๊ฐ์ ์ธ UX ๊ฐœ์„ 
    • ์ตœ์†Œ ๋น„์šฉ์œผ๋กœ ์ตœ๋Œ€ ํšจ๊ณผ
  2. Occlusion ๋ณต์›์— GroundingDINO Fallback

    • ์žฅ๊ธฐ Occlusion ๋ฌธ์ œ ํ•ด๊ฒฐ
    • ์†๋„ ์˜ํ–ฅ ๊ฑฐ์˜ ์—†์Œ

โŒ ๋น„๊ถŒ์žฅ

  • DeepSORT/ByteTrack/StrongSORT ๋ณ‘๋ ฌ ์‚ฌ์šฉ
    • ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด์— ํšจ๊ณผ ์—†์Œ
    • ์†๋„ ์ €ํ•˜ ์‹ฌ๊ฐ

์ž‘์„ฑ์ž: AI Assistant
๊ฒ€ํ†  ๊ธฐ์ค€: ์ •ํ™•๋„, ์†๋„, ๋ฉ”๋ชจ๋ฆฌ, ๊ตฌํ˜„ ๋ณต์žก๋„, ROI