SAM3-video-segmentation-tracking / docs /Auto_Mode_MultiGPU_Parallel_Plan.md
bellmake's picture
SAM3 Video Segmentation - Clean deployment
ae50268
# Auto-Mode ๋‹ค์ค‘ GPU ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๊ตฌํ˜„ ๊ณ„ํš
์ž‘์„ฑ์ผ: 2026-04-25
๋Œ€์ƒ: HuggingFace Spaces ์˜ dedicated multi-GPU ํ•˜๋“œ์›จ์–ด (์˜ˆ: 4ร— A100-80GB) ์—์„œ
Auto-Mode ํ์— ์—…๋กœ๋“œํ•œ N ๊ฐœ์˜ ์˜์ƒ์„ **๊ฐ GPU 1๋Œ€๋‹น 1์˜์ƒ**์”ฉ ๋™์‹œ์— ๋ถ„ํ• (segment) ํ•˜๋„๋ก
๊ตฌ์กฐ ๋ณ€๊ฒฝ.
---
## 0. ๋ฐฐ๊ฒฝ / ํ˜„์žฌ ๊ตฌ์กฐ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ
| ํ•ญ๋ชฉ | ํ˜„์žฌ (sequential) |
|------|-------------------|
| Auto-Mode ์ง„์ž…์  | `app.py:5033 _auto_mode_process(file_list, text_prompt)` |
| ๋‹จ์ผ ์˜์ƒ ๋ถ„ํ•  | `app.py:4896 @spaces.GPU(duration=119) def segment_video(...)` |
| ์›Œ์ปค ํ”„๋กœ์„ธ์Šค | `mp.get_context("spawn").Process(target=_segment_video_worker_entry, ...)` (`app.py:4854`) |
| ์ฝ”์–ด ๋กœ์ง | `app.py:4239 _segment_video_core(...)` (chunk-wise SAM3 ์ถ”๋ก ) |
| ๋ชจ๋ธ ์ธ์Šคํ„ด์Šคํ™” | `app.py:4453 predictor_cls = _get_sam3_predictor_cls(); predictor = predictor_cls(...)` |
| ๊ฒฐ๊ณผ ์ €์žฅ ๋””๋ ‰ํ† ๋ฆฌ | `build/downloads/` (`_persist_for_download` ํ˜ธ์ถœ) |
| ์ง„ํ–‰ ํ†ต์‹  | `mp.Queue` ๋กœ `progress / status / result / error` ๋ฉ”์‹œ์ง€ ์ŠคํŠธ๋ฆฌ๋ฐ |
| ์˜์ƒ ๊ฐ„ ์ฒ˜๋ฆฌ | `for path in paths:` ์ง๋ ฌ ๋ฃจํ”„, GPU cleanup (`_cleanup_cuda_cache()`) ํ›„ ๋‹ค์Œ ์˜์ƒ ์ฒ˜๋ฆฌ |
### HuggingFace Spaces ํ•˜๋“œ์›จ์–ด / `spaces.GPU` ๋™์ž‘
- `@spaces.GPU` ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ๋Š” `Config.zero_gpu` (= `SPACES_ZERO_GPU=true`) ์ธ ๊ฒฝ์šฐ์—๋งŒ ZeroGPU ์Šฌ๋ผ์ด์Šค ํ• ๋‹น ๋กœ์ง์ด ๋ถ™๋Š”๋‹ค (`spaces/zero/decorator.py:83`). dedicated GPU Space (4ร—A100 ๋“ฑ) ์—์„œ๋Š” ๋ฐ์ฝ”๋ ˆ์ดํ„ฐ๊ฐ€ **no-op** ์ด๋ฉฐ, ์ผ๋ฐ˜ Python ํ”„๋กœ์„ธ์Šค๊ฐ€ CUDA 4 ์žฅ ๋ชจ๋‘๋ฅผ ์ง์ ‘ ๋ณธ๋‹ค (`torch.cuda.device_count() == 4`).
- ZeroGPU(MIG slice) ๋ชจ๋“œ๋Š” ํ•œ ๋ฒˆ์— ํ•œ GPU ์Šฌ๋ผ์ด์Šค๋งŒ ํ• ๋‹น๋˜๋ฏ€๋กœ **์ด ๊ณ„ํš์€ dedicated multi-GPU ํ•˜๋“œ์›จ์–ด ์ „์ œ**์ด๋‹ค. ZeroGPU ํ™˜๊ฒฝ์—์„  ์ž๋™์œผ๋กœ ๊ธฐ์กด ์ง๋ ฌ ๊ฒฝ๋กœ๋กœ fallback ํ•œ๋‹ค.
### ๊ฒฉ๋ฆฌ(isolation) ์š”๊ตฌ์‚ฌํ•ญ ์ •๋ฆฌ
| ์ž์› | ์ถฉ๋Œ ๊ฐ€๋Šฅ์„ฑ | ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• |
|------|-------------|-----------|
| GPU ๋ฉ”๋ชจ๋ฆฌ / ์ปจํ…์ŠคํŠธ | ๊ฐ™์€ device ์œ„์—์„œ 4 ์˜์ƒ์ด ๋ชจ๋ธ์„ ๋™์‹œ ์ ์žฌ โ†’ OOM, ์ปจํ…์ŠคํŠธ ๊ฐ„์„ญ | ์›Œ์ปค๋งˆ๋‹ค `CUDA_VISIBLE_DEVICES=N` ํ™˜๊ฒฝ๋ณ€์ˆ˜๋กœ 1์žฅ๋งŒ ๋ณด์ด๊ฒŒ ๊ณ ์ • |
| `sam3.*` ๋ชจ๋“ˆ in-process ์บ์‹œ (`_SAM3_PREDICTOR_CLS`, `_LAST_SEG_CACHE`, `cached_frame_outputs` ๋“ฑ) | ๊ฐ™์€ ์ธํ„ฐํ”„๋ฆฌํ„ฐ ๋‚ด 4-way concurrent ํ˜ธ์ถœ์‹œ ์ƒํƒœ๊ฐ€ ์–ฝํž˜ | spawn ๋ฐฉ์‹์˜ ๋ณ„๋„ ํ”„๋กœ์„ธ์Šค โ†’ ๋ชจ๋“ˆ ์ƒํƒœ ์ž์ฒด๊ฐ€ ๋ถ„๋ฆฌ๋จ |
| `tempfile.mkdtemp()` (chunk ์ž…๋ ฅ dir, ํŠธ๋ฆฌ๋ฐ๋œ mp4) | `mkdtemp` ๋Š” ์ž๋™์œผ๋กœ ์ถฉ๋Œ ์—†๋Š” ์ด๋ฆ„ ์ƒ์„ฑ โ†’ ์•ˆ์ „ | ์ถ”๊ฐ€ ์กฐ์น˜ ๋ถˆํ•„์š” |
| `build/downloads/` ์‚ฐ์ถœ๋ฌผ ํŒŒ์ผ๋ช… | ๋™์‹œ ์‹œ์ž‘ ์˜์ƒ์ด ๋™์ผ timestamp โ†’ `auto_mode_results_YYYYMMDD_HHMMSS.zip` / `*_overlay.mp4` ์ถฉ๋Œ | ํŒŒ์ผ๋ช…์— short uuid (`uuid.uuid4().hex[:8]`) + ์˜์ƒ ์ธ๋ฑ์Šค ์ถ”๊ฐ€ |
| ๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ ๋‹ค์šด๋กœ๋“œ / BPE vocab | 4 ์›Œ์ปค๊ฐ€ ๋™์‹œ์— ๊ฐ™์€ ํŒŒ์ผ์„ download/write โ†’ race | ๋ถ€๋ชจ(๋ฉ”์ธ process)์—์„œ ์‚ฌ์ „ 1ํšŒ ๋ณด์žฅ ํ›„ ์›Œ์ปค๋Š” read-only |
| `.zerogpu/tensors` ๋“ฑ ์บ์‹œ | dedicated ๋ชจ๋“œ์—์„  ZeroGPU ์บ์‹œ๋Š” ์‚ฌ์šฉ ์•ˆ ํ•จ | ์˜ํ–ฅ ์—†์Œ |
| `sam3/` ๋””๋ ‰ํ† ๋ฆฌ ์ž์ฒด | Python import ๋Š” ํ”„๋กœ์„ธ์Šค๋งˆ๋‹ค ๋…๋ฆฝ โ†’ **๋””๋ ‰ํ† ๋ฆฌ ์‚ฌ๋ณธ ๋ถˆํ•„์š”** | ์‚ฌ๋ณธ ์ƒ์„ฑ X |
### ๊ฒฐ๋ก 
- **`sam3` ํด๋” ๋ณต์ œ๋Š” ํ•„์š” ์—†๋‹ค.** ๊ฒฉ๋ฆฌ ๋‹จ์œ„๋Š” โ€œํ”„๋กœ์„ธ์Šคโ€ ํ•œ ๋‹จ๊ณ„๋กœ ์ถฉ๋ถ„ํ•˜๋‹ค.
- **๊ฐ ์˜์ƒ์ด 1 ๊ฐœ์˜ spawn child process** ์—์„œ ์‹คํ–‰๋˜๋ฉฐ, child ์ง„์ž… ์งํ›„ (torch import ์ „) `CUDA_VISIBLE_DEVICES` ๋ฅผ 1 ์žฅ์œผ๋กœ ์ขํžŒ๋‹ค โ†’ child ์ž…์žฅ์—์„  ํ•ญ์ƒ `cuda:0` ํ•œ ๊ฐœ๋งŒ ์กด์žฌ โ†’ ๋ชจ๋ธ/SAM3 ์ฝ”๋“œ์˜ `cuda` / `cuda:0` ํ•˜๋“œ์ฝ”๋”ฉ ์–ด๋””๋“  ์•ˆ์ „.
- **๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค๋Š” GPU ์‚ฌ์šฉ X**. ๋‹จ์ˆœํžˆ 4-์Šฌ๋กฏ ํ’€์„ ์šด์˜ํ•˜๋Š” ๋””์ŠคํŒจ์ฒ˜ ์—ญํ• ๋งŒ ์ˆ˜ํ–‰. ๋ชจ๋“  ๋ฌด๊ฑฐ์šด import ๋Š” ์›Œ์ปค ์•ˆ์—์„œ.
---
## 1. ๋””์ž์ธ ๊ฐœ์š”
### 1.1 ์›Œ์ปค ํ’€ ๊ตฌ์กฐ
```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Gradio main process (no torch CUDA usage) โ”‚
โ”‚ โ”œโ”€ _auto_mode_process() generator โ”‚
โ”‚ โ”œโ”€ ParallelSegmentDispatcher โ”‚
โ”‚ โ”‚ โ”œโ”€ pool of N workers (N = min(num_gpus, num_videos)) โ”‚
โ”‚ โ”‚ โ”œโ”€ submit queue (video_path โ†’ free worker) โ”‚
โ”‚ โ”‚ โ”œโ”€ event queue (progress / status / result / err) โ”‚
โ”‚ โ”‚ โ””โ”€ per-video state: gpu_idx, started_at, last_pctโ€ฆ โ”‚
โ”‚ โ””โ”€ yields UI updates (status / per-video progress / files)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ spawn child ร— N
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Worker GPU 0 โ”‚ โ”‚ Worker GPU 1 โ”‚ โ”‚ Worker GPU N-1 โ”‚
โ”‚ CUDA_VISIBLE=0 โ”‚ โ”‚ CUDA_VISIBLE=1 โ”‚ โ”‚ CUDA_VISIBLE=N-1โ”‚
โ”‚ runs โ”‚ โ”‚ runs โ”‚ โ”‚ runs โ”‚
โ”‚ _segment_video โ”‚ โ”‚ _segment_video โ”‚ โ”‚ _segment_video โ”‚
โ”‚ _core(...) โ”‚ โ”‚ _core(...) โ”‚ โ”‚ _core(...) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```
- ๊ฐ ์›Œ์ปค๋Š” **์ƒ์‹œ ์‚ด์•„์žˆ๋Š”** โ€œpersistent workerโ€๋กœ ์šด์˜ํ•ด ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ๋กœ๋”ฉ ๋น„์šฉ์„ ์ฒซ ์˜์ƒ 1ํšŒ๋งŒ ๋ถ€๋‹ดํ•œ๋‹ค (์„ ํƒ ์ตœ์ ํ™”: ยง6.2). 1์ฐจ ๊ตฌํ˜„์€ ๋‹จ์ˆœํ•จ์„ ์œ„ํ•ด ์˜์ƒ๋งˆ๋‹ค ์›Œ์ปค ์ƒˆ๋กœ spawn ํ•˜๋Š” โ€œfresh-per-videoโ€ ๊ตฌ์กฐ๋กœ ์‹œ์ž‘ โ†’ ์•ˆ์ •ํ™” ํ›„ ์žฌ์‚ฌ์šฉํ˜•์œผ๋กœ ์ „ํ™˜.
- N ๊ฐœ์˜ ์˜์ƒ์ด 4 GPU ๋ณด๋‹ค ๋งŽ์œผ๋ฉด, ํ + ํ’€ ๊ตฌ์กฐ๋ผ ์ž๋™์œผ๋กœ ์ง๋ ฌํ™”๋œ๋‹ค (ํ•œ GPU ๊ฐ€ ํ•˜๋‚˜ ๋๋‚ด๋ฉด ๋‹ค์Œ ์˜์ƒ์„ ๋ฐ›์Œ).
### 1.2 ์›Œ์ปค entry ๋ชจ๋“ˆ ๋ถ„๋ฆฌ โ€” `parallel_segment_worker.py`
**์™œ ๋ณ„๋„ ํŒŒ์ผ์ด ํ•„์š”ํ•œ๊ฐ€:**
- ํ˜„์žฌ worker target (`_segment_video_worker_entry`) ์€ `app.py` ๋‚ด๋ถ€ ํ•จ์ˆ˜๋‹ค.
- spawn ์ž์‹ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์ด target ์„ unpickle ํ•˜๋ ค๋ฉด `app.py` ๋ฅผ import ํ•ด์•ผ ํ•˜๊ณ , `app.py:4` ์—์„œ `import torch` ๊ฐ€ ์ฆ‰์‹œ ์‹คํ–‰๋œ๋‹ค.
- ๊ทธ ์‹œ์ ์—” ์ž์‹์ด ์•„์ง `os.environ["CUDA_VISIBLE_DEVICES"]` ๋ฅผ ์ขํžˆ๊ธฐ ์ „์ด๋ฏ€๋กœ, torch ๊ฐ€ 4 ์žฅ ๋ชจ๋‘ ๋ณด์ด๋Š” ์ƒํƒœ๋กœ cuda runtime ์„ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค โ†’ ์šฐ๋ฆฌ๊ฐ€ `cuda:0` ๋งŒ ์“ฐ๋ ค ํ•ด๋„ ๋‹ค๋ฅธ ์žฅ์น˜ ์ปจํ…์ŠคํŠธ๊ฐ€ ๋”ฐ๋ผ์˜จ๋‹ค.
- ํ•ด๊ฒฐ: ์›Œ์ปค entry ๋ฅผ **torch ๋ฅผ top-level ์—์„œ import ํ•˜์ง€ ์•Š๋Š”** ์ƒˆ ํŒŒ์ผ๋กœ ๋ถ„๋ฆฌ. ์ž์‹์ด ๊ทธ ํŒŒ์ผ๋งŒ import ํ•œ ๋’ค, ํ•จ์ˆ˜ ๋ณธ๋ฌธ ์ฒซ ์ค„์—์„œ `os.environ["CUDA_VISIBLE_DEVICES"]` ์„ค์ •ํ•˜๊ณ , ๊ทธ *๋‹ค์Œ* torch / app ์„ import.
```python
# parallel_segment_worker.py (intentionally minimal top-level imports)
import os
import sys
import traceback
def worker_main(gpu_index, args, progress_queue):
os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_index)
os.environ["SAM3_WORKER_MODE"] = "1" # skip Gradio launch in app.py
os.environ.setdefault("SAM3_CACHE_FRAME_OUTPUTS", "0")
os.environ.setdefault("SAM3_OFFLOAD_TRACKER_STATE_TO_CPU", "1")
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
# NOW it is safe to import torch / app
import torch
if torch.cuda.is_available():
torch.cuda.set_device(0) # only one device visible: cuda:0
from app import _segment_video_core # imports torch but env is already set
(video_path, text_prompt, duration_limit, id_corrections_text,
id_drop_text, id_override_start_sec, show_trails, view_mode) = args
def _progress_cb(val, desc):
progress_queue.put({"type": "progress", "value": val, "desc": desc,
"gpu_index": gpu_index})
def _status_cb(msg):
progress_queue.put({"type": "status", "message": msg,
"gpu_index": gpu_index})
try:
progress_queue.put({"type": "progress", "value": 0.0,
"desc": f"GPU {gpu_index}: starting...",
"gpu_index": gpu_index})
out_path, status, loc_path = _segment_video_core(
video_path, text_prompt, duration_limit,
id_corrections_text=id_corrections_text,
id_drop_text=id_drop_text,
id_override_start_sec=id_override_start_sec,
show_trails=show_trails,
view_mode=view_mode,
progress_callback=_progress_cb,
status_callback=_status_cb,
)
progress_queue.put({"type": "result",
"data": (out_path, status, loc_path),
"gpu_index": gpu_index})
except Exception as exc: # noqa: BLE001
progress_queue.put({"type": "error",
"message": str(exc),
"traceback": traceback.format_exc(),
"gpu_index": gpu_index})
finally:
try:
import torch
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
except Exception:
pass
```
### 1.3 ๋””์ŠคํŒจ์ฒ˜ ํด๋ž˜์Šค โ€” `app.py` ๋‚ด๋ถ€ ์ถ”๊ฐ€
```python
class ParallelSegmentDispatcher:
"""Distribute one video per GPU concurrently and stream events back."""
def __init__(self, num_gpus: int):
self.num_gpus = num_gpus
self.ctx = mp.get_context("spawn")
self.event_queue = self.ctx.Queue()
self.workers: dict[int, mp.Process] = {} # gpu_index -> Process
self.gpu_assignments: dict[int, dict] = {} # gpu_index -> task meta
def submit(self, gpu_index, video_meta, args):
from parallel_segment_worker import worker_main
p = self.ctx.Process(
target=worker_main,
args=(gpu_index, args, self.event_queue),
daemon=False,
)
p.start()
self.workers[gpu_index] = p
self.gpu_assignments[gpu_index] = video_meta
def free_gpu(self, gpu_index):
proc = self.workers.pop(gpu_index, None)
meta = self.gpu_assignments.pop(gpu_index, None)
if proc is not None:
proc.join(timeout=5)
if proc.is_alive():
proc.terminate()
proc.join(timeout=5)
return meta
def shutdown(self):
for gi in list(self.workers.keys()):
self.free_gpu(gi)
```
### 1.4 `_auto_mode_process` ์˜ ๋ณ‘๋ ฌ ๋ณ€ํ˜• โ€” `_auto_mode_process_parallel`
๋Œ€๋žต์  ์•Œ๊ณ ๋ฆฌ์ฆ˜:
```text
๊ฐ€์šฉ GPU ์ˆ˜ G = torch.cuda.device_count()
์˜์ƒ ์ˆ˜ N = len(paths)
slot_count = min(G, N)
dispatcher = ParallelSegmentDispatcher(slot_count)
# 1) ์ดˆ๊ธฐ N ๊ฐœ ์ค‘ ์ฒซ slot_count ๊ฐœ๋ฅผ ๊ฐ GPU ์— ๋ฐฐ์ •
free_gpus = list(range(slot_count))
queue_index = 0
in_flight = 0
while queue_index < N and free_gpus:
gi = free_gpus.pop(0)
dispatcher.submit(gi, meta_for(queue_index), args_for(queue_index))
queue_index += 1
in_flight += 1
yield UI status
# 2) ์ด๋ฒคํŠธ ๋ฃจํ”„
while in_flight > 0:
msg = dispatcher.event_queue.get(timeout=...)
gi = msg["gpu_index"]
if msg["type"] == "progress":
update per-GPU progress bar text; aggregate overall progress
yield UI status
elif msg["type"] == "status":
append status for that GPU
yield UI status
elif msg["type"] == "result":
out_path, status, loc_path = msg["data"]
finalize: rename/persist with disambiguating suffix
append (mp4, csv) to all_results
yield UI status (with newly visible result)
dispatcher.free_gpu(gi)
in_flight -= 1
if queue_index < N:
dispatcher.submit(gi, meta_for(queue_index), args_for(queue_index))
queue_index += 1
in_flight += 1
yield UI status
elif msg["type"] == "error":
record failure for that video
yield UI status
dispatcher.free_gpu(gi)
in_flight -= 1
# same re-fill logic as result
# 3) ์ข…๋ฃŒ ์ •๋ฆฌ
dispatcher.shutdown()
yield final summary
```
---
## 2. UI ๋ณ€๊ฒฝ
### 2.1 ์ถ”๊ฐ€ ์ปดํฌ๋„ŒํŠธ โ€” `Auto-Mode (Batch Queue)` accordion ์•ˆ
| ์ปดํฌ๋„ŒํŠธ | ์šฉ๋„ |
|---------|------|
| `auto_mode_parallel_status` (Markdown) | GPU ์ˆ˜ / ํ™œ์„ฑ ์›Œ์ปค ์ˆ˜ / ํ์— ๋‚จ์€ ์˜์ƒ ์ˆ˜ / ๋ถ„๋ฅ˜๋ณ„ ์ง„ํ–‰๋ฅ  (์˜ˆ: `GPU0: video_a.mp4 73%`, `GPU1: video_b.mp4 41%` โ€ฆ) |
| ์˜์ƒ๋ณ„ ๊ฒฐ๊ณผ ๋ˆ„์ ์€ ๊ธฐ์กด `auto_results_files_state` / `auto_results_list` ์žฌ์‚ฌ์šฉ | ๋ณ€๊ฒฝ ์—†์Œ |
### 2.2 ๋‹จ์ผ ์˜์ƒ ๋ฏธ๋ฆฌ๋ณด๊ธฐ / overlay ์ปดํฌ๋„ŒํŠธ
๋ณ‘๋ ฌ ๋ชจ๋“œ์—์„  โ€œํ˜„์žฌ ์ฒ˜๋ฆฌ์ค‘โ€ ๋‹จ์ผ ์˜์ƒ์ด ์—†์œผ๋ฏ€๋กœ:
- `video_input` / `video_output` ๋“ฑ ๋‹จ์ผ ์Šฌ๋กฏ ์œ„์ ฏ์€ **๋งˆ์ง€๋ง‰์œผ๋กœ ์™„๋ฃŒ๋œ** ์˜์ƒ ๊ฒฐ๊ณผ๋กœ ๊ฐฑ์‹  (UX ์นœ์ ˆ).
- ์ฃผ๋œ ์ง„ํ–‰ ํ‘œ์‹œ๋Š” multiline `auto_mode_parallel_status` ๊ฐ€ ๋‹ด๋‹น.
### 2.3 fallback
- `torch.cuda.device_count() <= 1` โ†’ `_auto_mode_process` (ํ˜„์žฌ ์ง๋ ฌ ๋™์ž‘) ๊ทธ๋Œ€๋กœ ์œ ์ง€.
- `>1` โ†’ `_auto_mode_process_parallel` ๋ถ„๊ธฐ.
- ํ† ๊ธ€: ํ™˜๊ฒฝ๋ณ€์ˆ˜ `SAM3_PARALLEL_AUTO_MODE` (๊ธฐ๋ณธ `auto`, `0` ์œผ๋กœ ๋น„ํ™œ์„ฑ, `1` ๋กœ ๊ฐ•์ œ) ๋กœ ์˜ต์…˜ํ™”.
---
## 3. ํŒŒ์ผ๋ช… / ์ถœ๋ ฅ ์ถฉ๋Œ ๋ฐฉ์ง€
`build/downloads/` ๋””๋ ‰ํ† ๋ฆฌ ์•ˆ์— 4 ๊ฐœ ์˜์ƒ์ด ๊ฑฐ์˜ ๋™์‹œ์— ๊ฒฐ๊ณผ๋ฅผ ๋–จ์–ด๋œจ๋ฆด ๋•Œ:
| ํ•จ์ˆ˜ | ๋ณ€๊ฒฝ |
|------|------|
| `_rename_with_rule` | ๊ฒฐ๊ณผ ํŒŒ์ผ๋ช…์— ์งง์€ ์˜์ƒ-์ธ์Šคํ„ด์Šค ID ๋ฅผ ๋ผ์›Œ๋„ฃ๋„๋ก ๋ณด๊ฐ•. ์˜ˆ: `{stem}_{video_id8}_seg_{dur}_{elapsed}s.mp4` |
| `_persist_for_download` | ์ค‘๋ณต basename ์ธ ๊ฒฝ์šฐ `_{n}` ์ ‘๋ฏธ์‚ฌ ๋ถ€์—ฌ (์ด๋ฏธ ์–ด๋А ์ •๋„ ์ฒ˜๋ฆฌ๋˜์ง€๋งŒ race-safe ํ•˜๊ฒŒ `os.rename` ํ›„ ์žฌํ™•์ธ) |
| `_build_zip_from_paths` | ์ด๋ฏธ basename ์ค‘๋ณต disambiguation ๋กœ์ง ์žˆ์Œ (`seen_names`) โ†’ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ |
`video_id8` ๋Š” ๋””์ŠคํŒจ์ฒ˜๊ฐ€ ์˜์ƒ ํ์— ๋„ฃ์„ ๋•Œ `uuid.uuid4().hex[:8]` ๋กœ ํ•œ ๋ฒˆ ์ƒ์„ฑํ•˜์—ฌ `meta` ์— ์ €์žฅ.
---
## 4. ์•ˆ์ „์žฅ์น˜ / ์—ฃ์ง€ ์ผ€์ด์Šค
1. **GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์ „ ์ฒดํฌ**: ์›Œ์ปค ์ง„์ž… ์งํ›„ `_check_gpu_memory_safe()` ๊ฐ€ true ์ธ์ง€ ํ™•์ธ (๋ชจ๋ธ ์ตœ์ดˆ ์ ์žฌ ์ „). false ๋ฉด `error` ๋ฉ”์‹œ์ง€๋กœ ๋””์ŠคํŒจ์ฒ˜์— ๋ณด๊ณ ํ•˜๊ณ  ์ข…๋ฃŒ.
2. **์›Œ์ปค ๋น„์ •์ƒ ์ข…๋ฃŒ**: ๋””์ŠคํŒจ์ฒ˜๋Š” `event_queue.get(timeout=heartbeat)` ์œผ๋กœ ํด๋งํ•˜๋ฉฐ, heartbeat ์‹œ๊ฐ„ ๋‚ด ๋ฉ”์‹œ์ง€๊ฐ€ ์—†๊ณ  ํ•ด๋‹น ์›Œ์ปค๊ฐ€ `is_alive() == False` ๋ฉด `error` ์ฒ˜๋ฆฌ + `free_gpu`.
3. **๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค์˜ daemon ์ฒดํฌ**: ๊ธฐ์กด `segment_video` ๊ฐ€ `mp.current_process().daemon` ๋ฉด in-process ๋กœ ํด๋ฐฑํ•˜๋˜ ๋ถ„๊ธฐ (`app.py:4918`) ์™€ ๋™์ผํ•œ ์ •์‹ ์œผ๋กœ, ๋””์ŠคํŒจ์ฒ˜๋„ daemon ๋ถ€๋ชจ์—์„  ๋น„ํ™œ์„ฑํ™” โ†’ ์ˆœ์ฐจ ํด๋ฐฑ.
4. **์ทจ์†Œ(์Šคํ†ฑ ๋ฒ„ํŠผ)**: 1์ฐจ ๊ตฌํ˜„์—” ๋ฏธํฌํ•จ (ํ˜„์žฌ ์ง๋ ฌ ๋ชจ๋“œ์—๋„ stop ์—†์Œ). ํ›„์† ์ž‘์—….
5. **๋กœ๊ทธ prefix**: ์›Œ์ปค๊ฐ€ ๋ณด๋‚ด๋Š” progress/status ๋ฉ”์‹œ์ง€ ์•ž์— `[GPU{n}]` ์ ‘๋‘๋ฅผ ๋ถ™์—ฌ์„œ UI ์™€ stdout ๊ตฌ๋ถ„.
6. **๊ฒฐ์ •์  ๋””๋ฐ”์ด์Šค ๋ถ„๋ฐฐ**: ์˜์ƒ i ๊ฐ€ ๋ชจ๋‘ ๊ฐ™์€ GPU ๋กœ ๊ฐ€์ง€ ์•Š๋„๋ก ๋””์ŠคํŒจ์ฒ˜๊ฐ€ round-robin (์‚ฌ์‹ค์ƒ โ€œ๋จผ์ € ๋๋‚œ GPU ์— ๋‹ค์Œ ์˜์ƒโ€).
---
## 5. ํ…Œ์ŠคํŠธ / ๊ฒ€์ฆ
### 5.1 ๋กœ์ปฌ (๋‹จ์ผ GPU)
- `_parallel_dispatcher` ๊ฐ€ `device_count == 1` ์ผ ๋•Œ ์ž๋™์œผ๋กœ ์ง๋ ฌ ๊ฒฝ๋กœ๋กœ ํด๋ฐฑ๋˜๋Š”์ง€ ํ™•์ธ.
- ํ™˜๊ฒฝ๋ณ€์ˆ˜ `SAM3_PARALLEL_AUTO_MODE=1` + `CUDA_VISIBLE_DEVICES=0` โ†’ ๋””์ŠคํŒจ์ฒ˜๊ฐ€ 1-์Šฌ๋กฏ ๋ชจ๋“œ๋กœ ๋™์ž‘ (์›Œ์ปค 1๊ฐœ) โ€” ๊ฒฐ๊ณผ๊ฐ€ ๊ธฐ์กด `_auto_mode_process` ์™€ ๋™์ผํ•ด์•ผ ํ•จ.
### 5.2 ๋กœ์ปฌ (๊ฐ€์งœ ๋ฉ€ํ‹ฐ GPU ์‹œ๋ฎฌ๋ ˆ์ด์…˜)
- `SAM3_PARALLEL_AUTO_MODE=1` + `SAM3_FAKE_GPU_COUNT=4` ๋กœ ๋””์ŠคํŒจ์ฒ˜ ์ฝ”๋“œ๊ฐ€ 4-์Šฌ๋กฏ ํ’€์„ ๋งŒ๋“ค์ง€๋งŒ ์‹ค์ œ๋ก  ๋ชจ๋‘ ๋™์ผํ•œ device 0 ์„ ๊ณต์œ  (ํ…Œ์ŠคํŠธ์šฉ; ๋‹จ์ˆœ dispatcher ๋กœ์ง ๊ฒ€์ฆ).
### 5.3 HF Space (4ร—A100)
- 4 ๊ฐœ ์˜์ƒ ์—…๋กœ๋“œ โ†’ ๊ฐ ์˜์ƒ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์ด single-GPU ๋Œ€๋น„ 3.5~4ร— ๋นจ๋ผ์ง€๋Š”์ง€ ํ™•์ธ.
- `nvidia-smi` ๋กœ 4 ์žฅ ๋ชจ๋‘ utilization ์˜ฌ๋ผ๊ฐ€๋Š”์ง€ ํ™•์ธ (๋””๋ฒ„๊ทธ ๋กœ๊ทธ์— `GPU memory util:` ์ถœ๋ ฅ).
### 5.4 ํšŒ๊ท€
- ๋‹จ์ผ ์˜์ƒ โ€œRun Segmentationโ€ ๋ฒ„ํŠผ์€ ๋ณ€๊ฒฝ ์—†์Œ โ†’ ํšŒ๊ท€ ์œ„ํ—˜ ๋‚ฎ์Œ.
- ๊ฒฐ๊ณผ mp4 / csv ์˜ ๋ถ„ํ•  ์ •ํ™•๋„๋Š” ๋‹จ์ผ/๋ณ‘๋ ฌ ๋ชจ๋“œ์—์„œ bit-identical (๊ฐ™์€ ์‹œ๋“œ๋ผ๋ฉด) โ€” ๋‹จ์ผ vs ๋ณ‘๋ ฌ ๊ฒฐ๊ณผ mp4 ์˜ frame-by-frame mask IoU ๋กœ sanity check.
---
## 6. ๋‹จ๊ณ„๋ณ„ ๊ตฌํ˜„ ์ฒดํฌ๋ฆฌ์ŠคํŠธ (์‹คํ–‰ ์ˆœ์„œ)
์ด ๋ฌธ์„œ์— ์ ํžŒ ์ˆœ์„œ๋Œ€๋กœ ์ฝ”๋“œ ์ˆ˜์ •.
### Step 1 โ€” ์ƒˆ ํŒŒ์ผ `parallel_segment_worker.py` ์ƒ์„ฑ
- top-level imports: `os, sys, traceback` ๋งŒ.
- `worker_main(gpu_index, args, progress_queue)` ํ•จ์ˆ˜ 1.2 ์ ˆ ์ฝ”๋“œ๋Œ€๋กœ ์ž‘์„ฑ.
### Step 2 โ€” `app.py` ์— ๋””์ŠคํŒจ์ฒ˜ ํด๋ž˜์Šค ์ถ”๊ฐ€
- `class ParallelSegmentDispatcher:` ์ •์˜ (1.3 ์ ˆ).
- `import uuid` ๊ฐ€ ์ด๋ฏธ ์žˆ๋Š”์ง€ ํ™•์ธ (`app.py:30`) โ†’ โœ… ์žˆ์Œ.
### Step 3 โ€” `app.py` ์— `_auto_mode_process_parallel(...)` ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ์ถ”๊ฐ€
- ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ 1.4 ์ ˆ. ์ถœ๋ ฅ ํŠœํ”Œ ํ˜•ํƒœ๋Š” ๊ธฐ์กด `_auto_mode_process` ์˜ `_pkg(...)` ์™€ ๋™์ผํ•˜๊ฒŒ 19-tuple ์œ ์ง€ (UI ์™€์ด์–ด๋ง ๋ณ€๊ฒฝ ์•ˆ ํ•จ).
- `auto_mode_status` ๋ฉ”์‹œ์ง€๋ฅผ multiline ์œผ๋กœ ๊ตฌ์„ฑํ•ด GPU ๋ณ„ ์ง„ํ–‰๋ฅ  ๋…ธ์ถœ.
- ๊ฒฐ๊ณผ ํŒŒ์ผ๋ช… disambiguation: video meta ์— `vid8 = uuid.uuid4().hex[:8]`, `_rename_with_rule` ํ˜ธ์ถœ ํ›„ `_persist_for_download` ์ „ ๋‹จ๊ณ„์—์„œ stem ์— `_{vid8}` ์‚ฝ์ž….
### Step 4 โ€” `app.py` ์˜ `_auto_mode_process` ์ง„์ž…๋ถ€์— ๋ผ์šฐํ„ฐ ์ถ”๊ฐ€
- ํ•จ์ˆ˜ ์ฒซ ๋ถ€๋ถ„์—์„œ:
```python
num_gpus = torch.cuda.device_count() if torch.cuda.is_available() else 0
parallel_env = os.getenv("SAM3_PARALLEL_AUTO_MODE", "auto").lower()
use_parallel = (
(parallel_env == "1") or
(parallel_env == "auto" and num_gpus > 1)
) and not bool(os.getenv("SPACES_ZERO_GPU"))
if use_parallel:
yield from _auto_mode_process_parallel(file_list, text_prompt, num_gpus, progress)
return
```
- ZeroGPU ๋ชจ๋“œ์—์„  ๋น„ํ™œ์„ฑ (๊ฐ ํ˜ธ์ถœ์ด ์Šฌ๋ผ์ด์Šค ๋‹จ์œ„๋กœ ๋งŒ GPU ํ• ๋‹น๋ฐ›์Œ โ†’ ๋™์‹œ์„ฑ ๋ฌด์˜๋ฏธ).
### Step 5 โ€” `_segment_video_worker_entry` ์™€์˜ ์ฝ”๋“œ ์ค‘๋ณต ์ •๋ฆฌ
- ๊ธฐ์กด single-video ๊ฒฝ๋กœ (`segment_video` โ†’ `_segment_video_worker_entry`) ๋„ ์ ์ง„์ ์œผ๋กœ `parallel_segment_worker.worker_main` ์„ ์‚ฌ์šฉํ•ด ํ•œ ๊ณณ์—์„œ ๊ด€๋ฆฌํ•˜๋„๋ก ํ†ตํ•ฉ (์„ ํƒ). 1์ฐจ ๊ตฌํ˜„์—์„  **๊ฑด๋“œ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค** (ํšŒ๊ท€ ์œ„ํ—˜ ์ตœ์†Œํ™”).
### Step 6 โ€” ์ถœ๋ ฅ ํŒŒ์ผ๋ช… disambiguation ํŒจ์น˜
- `_rename_with_rule` ์‹œ๊ทธ๋‹ˆ์ฒ˜์— `extra_tag: str = ""` ์˜ต์…˜ ์ถ”๊ฐ€ (๊ธฐ๋ณธ ๋นˆ ๋ฌธ์ž์—ด๋กœ ํ›„๋ฐฉํ˜ธํ™˜).
- ๋ณ‘๋ ฌ ๊ฒฝ๋กœ์—์„œ๋งŒ `extra_tag=vid8` ์ „๋‹ฌ.
### Step 7 โ€” UI ํ…์ŠคํŠธ ๋ณด๊ฐ•
- `auto_mode_status` Markdown ์— multi-line ์ถœ๋ ฅ (GPU ๋ณ„ 1์ค„). ๋„ˆ๋ฌด ๊ธธ๋ฉด ์ ‘๊ธฐ ๊ฐ€๋Šฅํ•œ ์ฝ”๋“œ๋ธ”๋Ÿญ์œผ๋กœ.
- `gr.Progress` ๋Š” ๋‹จ์ผ ๋ง‰๋Œ€์ด๋ฏ€๋กœ, ๋ณ‘๋ ฌ ๋ชจ๋“œ์˜ โ€œ์ „์ฒด ํ‰๊ท  ์ง„ํ–‰๋ฅ โ€ ๋งŒ ๊ฑฐ๊ธฐ์— ๋ณด๋‚ด๊ณ  GPU ๋ณ„ ์„ธ๋ถ€๋Š” ํ…์ŠคํŠธ๋กœ.
### Step 8 โ€” ์Šค๋ชจํฌ ํ…Œ์ŠคํŠธ
- ๋กœ์ปฌ์—์„œ `python app.py` ๋กœ ๋„์šฐ๊ณ :
- ์˜์ƒ 2๊ฐœ ์—…๋กœ๋“œ โ†’ ๋‹จ์ผ GPU ํ™˜๊ฒฝ์—์„œ ์ง๋ ฌ ๋ชจ๋“œ๋กœ ๋™์ž‘ (GPU 1 ์žฅ๋งŒ ๋ณด์ž„).
- `SAM3_PARALLEL_AUTO_MODE=1 CUDA_VISIBLE_DEVICES=0 python app.py` โ†’ 1-์Šฌ๋กฏ ํ’€๋กœ ๋™์ž‘.
- ๊ฒฐ๊ณผ mp4 / csv ๊ฐ€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ƒ์„ฑ๋˜๋Š”์ง€, status UI ๊ฐ€ ๊ฐฑ์‹ ๋˜๋Š”์ง€ ํ™•์ธ.
### Step 9 โ€” ํ‘ธ์‹œ
- `requirements.txt` ๋ณ€๊ฒฝ ์—†์Œ (multiprocessing / uuid ํ‘œ์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ).
- HF Space ์—์„  ํ•˜๋“œ์›จ์–ด ํƒญ์—์„œ `4xA100-large` (๋˜๋Š” ๋“ฑ๊ฐ€) ๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•œ ํ›„ ๋™์ผ ์ฝ”๋“œ๋ฅผ ํ‘ธ์‹œํ•˜๋ฉด ์ž๋™์œผ๋กœ ๋ณ‘๋ ฌ ๋ชจ๋“œ ์ง„์ž….
---
## 7. ํ–ฅํ›„ ํ™•์žฅ (์ด๋ฒˆ PR ๋ฒ”์œ„ ์™ธ)
- **์›Œ์ปค ์žฌ์‚ฌ์šฉ (persistent)**: ๋งค ์˜์ƒ๋งˆ๋‹ค spawn ๋Œ€์‹  `Connection`/`Pipe` ๊ธฐ๋ฐ˜ RPC ๋กœ ๋ช…๋ น์„ ์›Œ์ปค์— ๋ณด๋‚ด ๋ชจ๋ธ 1ํšŒ๋งŒ ์ ์žฌ. SAM3 ๊ฐ€์ค‘์น˜ ๋กœ๋”ฉ ๋น„์šฉ์ด ์˜์ƒ๋‹น 1~3 ๋ถ„์ด๋ผ๋ฉด ์†๋„ ์ด๋“ ํผ.
- **์ทจ์†Œ / ์ผ์‹œ์ •์ง€**: stop ๋ฒ„ํŠผ โ†’ ๋””์ŠคํŒจ์ฒ˜๊ฐ€ ๋ชจ๋“  ์›Œ์ปค์— SIGTERM ๋ณด๋‚ด๊ณ  partial ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜.
- **GPU ๋ณ„ ๋ฉ”๋ชจ๋ฆฌ ๋‹ค๋ฅธ ๊ฒฝ์šฐ**: ํฐ ์˜์ƒ์€ 80 GB GPU ๋กœ, ์ž‘์€ ์˜์ƒ์€ ์ž‘์€ GPU ๋กœ ๋ผ์šฐํŒ…ํ•˜๋Š” ์šฐ์„ ์ˆœ์œ„ ํ.
- **๋ถ„์‚ฐ (multi-node)**: ๋™์ผ ์ธํ„ฐํŽ˜์ด์Šค๋กœ worker ๋ฅผ SSH ๋„ˆ๋จธ ๋…ธ๋“œ๋กœ ๋„์šธ ์ˆ˜ ์žˆ๊ฒŒ ์ถ”์ƒํ™”.