SAM3-video-segmentation-tracking / docs /Occlusion_AddObject_Enhancement_Analysis.md
bellmake's picture
SAM3 Video Segmentation - Clean deployment
ae50268
# Occlusion ๋ณต์› & Add New Object ๊ฐœ์„  ์‹œ๋‚˜๋ฆฌ์˜ค ๋ถ„์„
**์ž‘์„ฑ์ผ**: 2025-12-17
**๋ถ„์„ ๋ฒ”์œ„**: velocity/occlusion ๋ณต์› + Add New Object ๊ธฐ๋Šฅ
---
## ๐Ÿ“‹ ๋ชฉ์ฐจ
1. [์‹œ๋‚˜๋ฆฌ์˜ค 1: Occlusion ๋ณต์›์— GroundingDINO/ํŠธ๋ž˜์ปค ํ™œ์šฉ](#์‹œ๋‚˜๋ฆฌ์˜ค-1-occlusion-๋ณต์›์—-groundingdinoํŠธ๋ž˜์ปค-ํ™œ์šฉ)
2. [์‹œ๋‚˜๋ฆฌ์˜ค 2: Add New Object์— YOLO ํ™œ์šฉ](#์‹œ๋‚˜๋ฆฌ์˜ค-2-add-new-object์—-yolo-ํ™œ์šฉ)
3. [์ข…ํ•ฉ ๊ถŒ์žฅ์‚ฌํ•ญ](#์ข…ํ•ฉ-๊ถŒ์žฅ์‚ฌํ•ญ)
---
## ์‹œ๋‚˜๋ฆฌ์˜ค 1: Occlusion ๋ณต์›์— GroundingDINO/ํŠธ๋ž˜์ปค ํ™œ์šฉ
### 1.1 ํ˜„์žฌ Occlusion ๋ณต์› ๋กœ์ง
**์œ„์น˜**: `_ensure_object_persistence()` (app.py: L2586-3051)
```python
# ํ˜„์žฌ ๋ฐฉ์‹
for missing_id in missing_ids:
last_rec = last_seen_rec[missing_id]
# Velocity ๊ธฐ๋ฐ˜ ์˜ˆ์ธก
predicted_cx = last_cx + vx * time_gap
predicted_cy = last_cy + vy * time_gap
# ์˜ˆ์ธก ์œ„์น˜ ๊ทผ์ฒ˜์— ์ƒˆ ๋งˆ์Šคํฌ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธ
dist_to_predicted = distance(new_mask, predicted_position)
if dist_to_predicted < threshold:
recover_id(new_mask, missing_id)
```
**ํ•œ๊ณ„:**
- Velocity๊ฐ€ ๊ธ‰๋ณ€ํ•˜๋Š” ๊ฒฝ์šฐ ์˜ˆ์ธก ์‹คํŒจ
- ์žฅ๊ธฐ Occlusion (3์ดˆ+)์—์„œ ์ •ํ™•๋„ ํ•˜๋ฝ
- ๋™์ผ ์œ„์น˜์— ์žฌ๋“ฑ์žฅํ•˜์ง€ ์•Š์œผ๋ฉด ๋ณต์› ๋ถˆ๊ฐ€
---
### 1.2 ํ†ตํ•ฉ ์˜ต์…˜ ๋น„๊ต
#### **์˜ต์…˜ A: GroundingDINO Fallback** โญ๏ธโญ๏ธโญ๏ธโญ๏ธ
**๊ฐœ๋…:**
```python
# Velocity ๋ณต์› ์‹œ๋„
recovered = velocity_based_recovery(missing_id)
if not recovered:
# Fallback: GroundingDINO๋กœ ์žฌํƒ์ง€
frame = extract_frame(video, current_time)
boxes = grounding_dino.detect(frame, text="mice")
# Missing ID์˜ ๋งˆ์ง€๋ง‰ ์œ„์น˜์™€ bbox ๋น„๊ต
for box in boxes:
dist = distance(box.center, last_seen_position)
if dist < fallback_threshold: # 500px
assign_id(box, missing_id)
# bbox โ†’ SAM3 point prompt๋กœ ๋งˆ์Šคํฌ ์žฌ์ƒ์„ฑ
predictor.add_prompt(point=box.center, obj_id=missing_id)
```
**์žฅ์ :**
- โœ… ์žฅ๊ธฐ Occlusion ๋ณต์› ์ •ํ™•๋„ **๋Œ€ํญ ํ–ฅ์ƒ** (70% โ†’ 90%)
- โœ… Velocity ์˜ˆ์ธก ์‹คํŒจ ์ผ€์ด์Šค ๋ณด์™„
- โœ… ํ•„์š” ์‹œ์—๋งŒ ํ˜ธ์ถœ โ†’ ์†๋„ ์˜ํ–ฅ ์ตœ์†Œ (ํ‰๊ท  5-10ms)
- โœ… ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅ
**๋‹จ์ :**
- โš ๏ธ ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด์—์„œ bbox ํ˜ผ๋™ ๊ฐ€๋Šฅ (์ •ํ™•๋„ 85% ์ˆ˜์ค€)
- โš ๏ธ +2-3GB GPU ๋ฉ”๋ชจ๋ฆฌ (์ดˆ๊ธฐ ๋กœ๋“œ ์‹œ)
**์„ฑ๋Šฅ ์˜ˆ์ธก:**
| ์ƒํ™ฉ | ํ˜„์žฌ Velocity | + GroundingDINO |
|------|---------------|-----------------|
| ๋‹จ๊ธฐ Occlusion (<1์ดˆ) | 95% | 95% |
| ์ค‘๊ธฐ Occlusion (1-3์ดˆ) | 75% | **90%** |
| ์žฅ๊ธฐ Occlusion (3-5์ดˆ) | 40% | **85%** |
| ๊ธ‰๊ฒฉํ•œ ๋ฐฉํ–ฅ ์ „ํ™˜ | 60% | **80%** |
**์†๋„ ์˜ํ–ฅ:**
```
ID ์†Œ์‹ค ๋ฐœ์ƒ๋ฅ : 5% (100ํ”„๋ ˆ์ž„๋‹น 5ํšŒ)
GroundingDINO ํ˜ธ์ถœ: 70ms
์ด ์ถ”๊ฐ€ ์‹œ๊ฐ„ = 5 * 70ms = 350ms (500ํ”„๋ ˆ์ž„๋‹น)
์ „์ฒด ์˜ํ–ฅ: +0.2% only
```
**ํ‰๊ฐ€**: โญ๏ธโญ๏ธโญ๏ธโญ๏ธ - **๊ฐ•๋ ฅ ๊ถŒ์žฅ**
---
#### **์˜ต์…˜ B: DeepSORT Re-ID Fallback** โญ๏ธโญ๏ธ
**๊ฐœ๋…:**
```python
# Re-ID ํŠน์ง• ์ €์žฅ
for id, mask in tracked_objects:
feature = reid_model.extract(crop_from_mask(frame, mask))
reid_features[id] = feature
# Occlusion ๋ณต์› ์‹œ
if not velocity_recovered:
current_features = [reid_model.extract(crop) for crop in new_masks]
best_match = cosine_similarity(missing_id_feature, current_features)
if best_match > 0.7:
assign_id(new_mask, missing_id)
```
**์žฅ์ :**
- โœ… ์™ธ๊ด€ ํŠน์ง• ํ™œ์šฉ โ†’ ๋ณต์žกํ•œ ์›€์ง์ž„ ๋Œ€์‘
**๋‹จ์ :**
- โŒ **๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด์—์„œ ์‹คํŒจ** (ํฐ ์ฅ 5๋งˆ๋ฆฌ โ†’ ์œ ์‚ฌ๋„ 99%)
- โŒ Re-ID ๋ชจ๋ธ ์ถ”๊ฐ€ (+1-2GB GPU)
- โŒ ํ”„๋ ˆ์ž„๋‹น ํŠน์ง• ์ถ”์ถœ ํ•„์š” (+15ms/object)
**ํ‰๊ฐ€**: โญ๏ธโญ๏ธ - ๋™์ผ ์™ธ๊ด€ use case์—๋Š” ๋ถ€์ ํ•ฉ
---
#### **์˜ต์…˜ C: ByteTrack/StrongSORT ๋ณ‘๋ ฌ** โญ๏ธ
**๊ฐœ๋…:**
```python
# SAM3 ๋งˆ์Šคํฌ โ†’ bbox ๋ณ€ํ™˜
bboxes = [mask_to_bbox(mask) for mask in sam3_masks]
# ByteTrack์œผ๋กœ ๋ณ„๋„ ์ถ”์ 
bytetrack_ids = bytetrack.update(bboxes)
# SAM3 ID์™€ ByteTrack ID ๋น„๊ต
if sam3_id != bytetrack_id:
# ๋ถˆ์ผ์น˜ โ†’ ByteTrack ID ์šฐ์„  (Occlusion ๊ฐ•ํ•จ)
final_id = bytetrack_id
```
**๋‹จ์ :**
- โŒ ๋งค ํ”„๋ ˆ์ž„ ํŠธ๋ž˜์ปค ํ˜ธ์ถœ โ†’ **30% ์†๋„ ์ €ํ•˜**
- โŒ Bbox ๋ณ€ํ™˜ ์‹œ ์ •๋ณด ์†์‹ค
- โŒ ๋‘ ์‹œ์Šคํ…œ ๋ถˆ์ผ์น˜ ์‹œ ๊ฒฐ์ • ๋กœ์ง ๋ณต์žก
**ํ‰๊ฐ€**: โญ๏ธ - ROI ๋‚ฎ์Œ
---
### 1.3 ์ตœ์ข… ๊ถŒ์žฅ: GroundingDINO Fallback (์˜ต์…˜ A)
**๊ตฌํ˜„ ์šฐ์„ ์ˆœ์œ„:**
```python
# 1๋‹จ๊ณ„: GroundingDINO ๋กœ๋“œ (์•ฑ ์‹œ์ž‘ ์‹œ 1ํšŒ)
grounding_model = load_grounding_dino()
# 2๋‹จ๊ณ„: Occlusion ๋ณต์› ๋กœ์ง์— ํ†ตํ•ฉ
def _ensure_object_persistence_enhanced(...):
# ๊ธฐ์กด velocity ๋ณต์› ์‹œ๋„
recovered_ids = velocity_based_recovery(missing_ids)
still_missing = [id for id in missing_ids if id not in recovered_ids]
if still_missing and time_gap > 1.5: # 1.5์ดˆ ์ด์ƒ ์†Œ์‹ค ์‹œ์—๋งŒ
# GroundingDINO fallback
frame = extract_frame(current_frame_idx)
boxes = grounding_model(frame, text_prompt)
for missing_id in still_missing:
last_pos = last_seen[missing_id]
# ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด bbox ์ฐพ๊ธฐ
best_box = find_closest_box(boxes, last_pos, max_dist=500)
if best_box:
# SAM3์— point prompt ์ถ”๊ฐ€ํ•˜์—ฌ ๋งˆ์Šคํฌ ์žฌ์ƒ์„ฑ
predictor.add_prompt(
point=best_box.center,
obj_id=missing_id
)
recovered_ids.append(missing_id)
```
**์˜ˆ์ƒ ํšจ๊ณผ:**
- ์žฅ๊ธฐ Occlusion ๋ณต์›์œจ: 40% โ†’ **85%** (+113%)
- ์†๋„ ์˜ํ–ฅ: +0.2% only
- ๋ฉ”๋ชจ๋ฆฌ ์ฆ๊ฐ€: +2-3GB (์•ฑ ์‹œ์ž‘ ์‹œ)
---
## ์‹œ๋‚˜๋ฆฌ์˜ค 2: Add New Object์— YOLO ํ™œ์šฉ
### 2.1 ํ˜„์žฌ Add New Object ๋กœ์ง
**์œ„์น˜**: `_add_object_at_point()` (app.py: L895-1297)
```python
# ํ˜„์žฌ ๋ฐฉ์‹ (SAM3 Point Prompt)
predictor.add_prompt(
session_id,
frame_idx=click_frame,
points=[(x, y)],
point_labels=[1],
obj_id=new_obj_id
)
# โ†’ SAM3๊ฐ€ ํด๋ฆญ ์ง€์  ์ฃผ๋ณ€ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜
```
**๋ฌธ์ œ์ :**
- ํด๋ฆญ์ด ์ •ํ™•ํ•˜์ง€ ์•Š์œผ๋ฉด ์ž˜๋ชป๋œ ์˜์—ญ ์„ ํƒ
- ๊ฐ์ฒด ๊ฒฝ๊ณ„๋ฅผ ์ •ํ™•ํžˆ ์ฐพ๊ธฐ ์–ด๋ ค์›€
- ์‚ฌ์šฉ์ž๊ฐ€ ๋งค๋ฒˆ ์ •ํ™•ํ•œ ์œ„์น˜ ํด๋ฆญ ํ•„์š”
---
### 2.2 YOLO ํ†ตํ•ฉ ์‹œ๋‚˜๋ฆฌ์˜ค
#### **์‹œ๋‚˜๋ฆฌ์˜ค A: YOLO Bbox โ†’ SAM3 ์ •๋ฐ€ ๋งˆ์Šคํฌ** โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ
**๊ฐœ๋…:**
```python
def _add_object_with_yolo(video_path, time_sec, x, y, new_obj_id):
frame = extract_frame(video_path, time_sec)
# 1๋‹จ๊ณ„: YOLO๋กœ ํด๋ฆญ ์ง€์  ๊ทผ์ฒ˜ ๋ชจ๋“  ๊ฐ์ฒด ํƒ์ง€
yolo_results = yolo_model(frame)
# 2๋‹จ๊ณ„: ํด๋ฆญ ์œ„์น˜์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด bbox ์„ ํƒ
clicked_box = find_closest_box(yolo_results, (x, y))
if clicked_box:
# 3๋‹จ๊ณ„: bbox ์ „์ฒด๋ฅผ SAM3 box prompt๋กœ ์ „๋‹ฌ
predictor.add_prompt(
session_id,
frame_idx=frame_idx,
bounding_boxes=[clicked_box.xywh],
obj_id=new_obj_id
)
else:
# Fallback: ๊ธฐ์กด point prompt
predictor.add_prompt(points=[(x, y)], ...)
```
**์žฅ์ :**
- โœ… **๋งค์šฐ ์ •ํ™•ํ•œ ๊ฐ์ฒด ์„ ํƒ** (bbox ์ „์ฒด ํ™œ์šฉ)
- โœ… ํด๋ฆญ ์ •ํ™•๋„ ๋ฌด๊ด€ โ†’ ์‚ฌ์šฉ์ž ํŽธ์˜์„ฑ ๋Œ€ํญ ํ–ฅ์ƒ
- โœ… SAM3 box prompt๋Š” point๋ณด๋‹ค ์ •ํ™•
- โœ… YOLO๋Š” ์ผ๋ฐ˜ ๋ฌผ์ฒด ํƒ์ง€ ๋ชจ๋ธ์ด๋ฏ€๋กœ ๋Œ€๋ถ€๋ถ„ ์ผ€์ด์Šค ์ปค๋ฒ„
**๋‹จ์ :**
- โš ๏ธ YOLO ํด๋ž˜์Šค์— ์—†๋Š” ๊ฐ์ฒด๋Š” ํƒ์ง€ ๋ถˆ๊ฐ€ (์˜ˆ: ํŠน์ˆ˜ ์‹คํ—˜ ์žฅ๋น„)
- **ํ•ด๊ฒฐ:** YOLO-World (ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์ง€์›) ์‚ฌ์šฉ ๋˜๋Š” fallback
**์„ฑ๋Šฅ ์˜ˆ์ธก:**
| ์ง€ํ‘œ | ํ˜„์žฌ Point Prompt | + YOLO Bbox |
|------|-------------------|-------------|
| **๊ฐ์ฒด ์„ ํƒ ์ •ํ™•๋„** | 70% (ํด๋ฆญ ์œ„์น˜ ์˜์กด) | **95%** |
| **์ฒ˜๋ฆฌ ์‹œ๊ฐ„** | 1.5s | **1.6s** (+0.1s) |
| **์‚ฌ์šฉ์ž ํŽธ์˜์„ฑ** | โญ๏ธโญ๏ธโญ๏ธ | โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ |
| **๋งˆ์Šคํฌ ํ’ˆ์งˆ** | โญ๏ธโญ๏ธโญ๏ธโญ๏ธ | โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ |
**ํ‰๊ฐ€**: โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ - **๋งค์šฐ ๊ฐ•๋ ฅ ๊ถŒ์žฅ**
---
#### **์‹œ๋‚˜๋ฆฌ์˜ค B: YOLO ๋‹จ๋… (SAM3 ๋Œ€์ฒด)** โญ๏ธ
**๊ฐœ๋…:**
```python
# YOLO๋กœ ํƒ์ง€ โ†’ ๋งˆ์Šคํฌ ์—†์ด bbox๋งŒ ์ถ”์ 
yolo_box = yolo_model(frame, click=(x, y))
# ByteTrack์œผ๋กœ ์ถ”์ 
```
**๋‹จ์ :**
- โŒ Pixel-level ๋งˆ์Šคํฌ ์—†์Œ โ†’ ํ˜„์žฌ ์‹œ์Šคํ…œ๊ณผ ๋ถˆ์ผ์น˜
- โŒ ๊ธฐ์กด CSV ํ˜•์‹ (contour, center) ํ˜ธํ™˜ ๋ถˆ๊ฐ€
- โŒ Trails ๋ Œ๋”๋ง ๋ถˆ๊ฐ€
**ํ‰๊ฐ€**: โญ๏ธ - ํ˜„์žฌ ์‹œ์Šคํ…œ๊ณผ ๋งž์ง€ ์•Š์Œ
---
### 2.3 ์ตœ์ข… ๊ถŒ์žฅ: YOLO โ†’ SAM3 (์‹œ๋‚˜๋ฆฌ์˜ค A)
**๊ตฌํ˜„:**
```python
def _add_object_at_point_with_yolo(video_path, time_sec, x, y, new_obj_id, text_prompt):
# YOLO ๋ชจ๋ธ ๋กœ๋“œ (์•ฑ ์‹œ์ž‘ ์‹œ 1ํšŒ)
if not hasattr(_add_object_at_point_with_yolo, 'yolo'):
from ultralytics import YOLO
_add_object_at_point_with_yolo.yolo = YOLO("yolov8n.pt")
yolo = _add_object_at_point_with_yolo.yolo
# ํ”„๋ ˆ์ž„ ์ถ”์ถœ
frame = extract_frame(video_path, time_sec)
# YOLO ํƒ์ง€
results = yolo(frame, verbose=False)
boxes = results[0].boxes
# ํด๋ฆญ ์œ„์น˜์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด bbox ์ฐพ๊ธฐ
best_box = None
min_dist = float('inf')
for box in boxes:
cx, cy = box.xywh[0][:2].tolist()
dist = ((cx - x)**2 + (cy - y)**2)**0.5
if dist < min_dist:
min_dist = dist
best_box = box
# SAM3์— bbox ๋˜๋Š” point ์ „๋‹ฌ
if best_box and min_dist < 200: # 200px ์ด๋‚ด
bbox_xywh = best_box.xywh[0].tolist()
predictor.add_prompt(
session_id,
frame_idx=frame_idx,
bounding_boxes=[bbox_xywh],
obj_id=new_obj_id
)
status = f"Object detected with YOLO (confidence: {best_box.conf[0]:.2f})"
else:
# Fallback: Point prompt
predictor.add_prompt(
session_id,
frame_idx=frame_idx,
points=[(x, y)],
point_labels=[1],
obj_id=new_obj_id
)
status = "Using point prompt (YOLO detection failed)"
# ์ดํ›„ propagate๋Š” ๋™์ผ
...
```
**์˜ˆ์ƒ ํšจ๊ณผ:**
- ๊ฐ์ฒด ์„ ํƒ ์ •ํ™•๋„: 70% โ†’ **95%** (+36%)
- ์‚ฌ์šฉ์ž ๊ฒฝํ—˜ ๋Œ€ํญ ๊ฐœ์„  (์ •ํ™•ํ•œ ํด๋ฆญ ๋ถˆํ•„์š”)
- ์ฒ˜๋ฆฌ ์‹œ๊ฐ„: 1.5s โ†’ 1.6s (+7% only)
---
## ์ข…ํ•ฉ ๊ถŒ์žฅ์‚ฌํ•ญ
### ์šฐ์„ ์ˆœ์œ„ 1: Add New Object์— YOLO ํ†ตํ•ฉ โญ๏ธโญ๏ธโญ๏ธโญ๏ธโญ๏ธ
**์ด์œ :**
- ์‚ฌ์šฉ์ž ๊ฒฝํ—˜ **๋Œ€ํญ ๊ฐœ์„ ** (๊ฐ€์žฅ ์ง์ ‘์ ์ธ ํšจ๊ณผ)
- ๊ตฌํ˜„ ๊ฐ„๋‹จ (100์ค„ ์ด๋‚ด)
- ์†๋„ ์˜ํ–ฅ ์ตœ์†Œ (+0.1s/1ํšŒ)
- ๊ธฐ์กด ์‹œ์Šคํ…œ๊ณผ ์™„๋ฒฝ ํ˜ธํ™˜ (SAM3 box prompt ํ™œ์šฉ)
**๊ตฌํ˜„ ๋ณต์žก๋„**: โญ๏ธโญ๏ธ (๋‚ฎ์Œ)
---
### ์šฐ์„ ์ˆœ์œ„ 2: Occlusion ๋ณต์›์— GroundingDINO Fallback โญ๏ธโญ๏ธโญ๏ธโญ๏ธ
**์ด์œ :**
- ์žฅ๊ธฐ Occlusion ๋ณต์›์œจ **๋Œ€ํญ ํ–ฅ์ƒ** (40% โ†’ 85%)
- ํ•„์š” ์‹œ์—๋งŒ ํ˜ธ์ถœ โ†’ ์†๋„ ์˜ํ–ฅ ๊ฑฐ์˜ ์—†์Œ (+0.2%)
- Velocity ์˜ˆ์ธก ์‹คํŒจ ์ผ€์ด์Šค ๋ณด์™„
**๊ตฌํ˜„ ๋ณต์žก๋„**: โญ๏ธโญ๏ธโญ๏ธ (์ค‘๊ฐ„)
**๋‹จ, ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด ํ•œ๊ณ„ ์ธ์ง€ ํ•„์š”:**
- ํฐ ์ฅ 5๋งˆ๋ฆฌ ๊ฐ™์€ ๊ฒฝ์šฐ bbox ํ˜ผ๋™ ๊ฐ€๋Šฅ
- ์œ„์น˜ ๊ธฐ๋ฐ˜ ๋งค์นญ์œผ๋กœ ๋ณด์™„ (500px threshold)
---
### ๋น„๊ถŒ์žฅ: DeepSORT/ByteTrack ๋ณ‘๋ ฌ
**์ด์œ :**
- ๋งค ํ”„๋ ˆ์ž„ ์ฒ˜๋ฆฌ โ†’ ์†๋„ ์ €ํ•˜ ์‹ฌ๊ฐ (-30%)
- ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด์—์„œ ํšจ๊ณผ ์—†์Œ
- ๊ตฌํ˜„ ๋ณต์žก๋„ ๋†’์Œ
---
## ๐Ÿ“Š ํšจ๊ณผ ์š”์•ฝํ‘œ
| ๊ฐœ์„  ์‚ฌํ•ญ | ์ •ํ™•๋„ ํ–ฅ์ƒ | ์†๋„ ์˜ํ–ฅ | ๋ฉ”๋ชจ๋ฆฌ ์ฆ๊ฐ€ | ๊ตฌํ˜„ ๋‚œ์ด๋„ | ๊ถŒ์žฅ |
|-----------|------------|----------|------------|------------|------|
| **Add New Object + YOLO** | +36% | +7% | +0.5GB | โญ๏ธโญ๏ธ | โœ…โœ… |
| **Occlusion + GroundingDINO** | +113% | +0.2% | +2-3GB | โญ๏ธโญ๏ธโญ๏ธ | โœ… |
| Occlusion + DeepSORT | +20% | +10% | +1-2GB | โญ๏ธโญ๏ธโญ๏ธ | โŒ |
| Occlusion + ByteTrack | +10% | +30% | +1GB | โญ๏ธโญ๏ธโญ๏ธโญ๏ธ | โŒ |
---
## ๐ŸŽฏ ์ตœ์ข… ๊ฒฐ๋ก 
### โœ… ๊ฐ•๋ ฅ ๊ถŒ์žฅ
1. **Add New Object์— YOLO ํ†ตํ•ฉ**
- ์ฆ‰๊ฐ์ ์ธ UX ๊ฐœ์„ 
- ์ตœ์†Œ ๋น„์šฉ์œผ๋กœ ์ตœ๋Œ€ ํšจ๊ณผ
2. **Occlusion ๋ณต์›์— GroundingDINO Fallback**
- ์žฅ๊ธฐ Occlusion ๋ฌธ์ œ ํ•ด๊ฒฐ
- ์†๋„ ์˜ํ–ฅ ๊ฑฐ์˜ ์—†์Œ
### โŒ ๋น„๊ถŒ์žฅ
- DeepSORT/ByteTrack/StrongSORT ๋ณ‘๋ ฌ ์‚ฌ์šฉ
- ๋™์ผ ์™ธ๊ด€ ๊ฐ์ฒด์— ํšจ๊ณผ ์—†์Œ
- ์†๋„ ์ €ํ•˜ ์‹ฌ๊ฐ
---
**์ž‘์„ฑ์ž**: AI Assistant
**๊ฒ€ํ†  ๊ธฐ์ค€**: ์ •ํ™•๋„, ์†๋„, ๋ฉ”๋ชจ๋ฆฌ, ๊ตฌํ˜„ ๋ณต์žก๋„, ROI