Spaces:
Sleeping
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Multi-GPU video analysis platform with three fully functional modes:
- Object Detection: Bounding boxes via YOLO11, DETR, or Grounding DINO
- Segmentation: Mask overlays via Grounded SAM2 (GSAM2) or YOLO+SAM2 (YSAM2)
- Drone Detection: Aerial object detection via YOLOv8 fine-tuned on VisDrone
Deployed as a HuggingFace Space (Docker SDK) at https://biaslab2025-isr.hf.space.
Development Commands
# Setup
uv sync
# Run dev server
uvicorn app:app --host 0.0.0.0 --port 7860 --reload
# Verify imports (quick smoke test β no tests exist yet)
python -c "from app import app"
# Docker
docker build -t isr . && docker run -p 7860:7860 isr
# Test async detection
curl -X POST http://localhost:7860/detect/async \
-F "video=@sample.mp4" \
-F "mode=object_detection" \
-F "queries=person,car" \
-F "detector=yolo11"
Core Architecture
Async Detection Flow (primary path)
Frontend (index.html) β POST /detect/async β background task β MJPEG stream + polling
- Frontend uploads video + mode + queries to
/detect/async - Backend creates a
JobInfo, spawnsprocess_video_async()as anasyncio.Task inference.pyruns multi-GPU parallel inference, publishing frames to an MJPEG stream- Frontend consumes
/detect/stream/{job_id}for live video, polls/detect/status/{job_id} - On completion, frontend fetches final video from
/detect/video/{job_id}
API Endpoints (app.py)
| Method | Path | Purpose |
|---|---|---|
| POST | /detect/async |
Start async job (returns job_id + stream/status URLs) |
| GET | /detect/status/{job_id} |
Poll job status |
| GET | /detect/stream/{job_id} |
MJPEG live stream (event-driven, 640px wide) |
| GET | /detect/video/{job_id} |
Download processed MP4 |
| GET | /detect/depth-video/{job_id} |
Download depth video |
| GET | /detect/tracks/{job_id}/summary |
Per-frame detection counts (timeline heatmap) |
| GET | /detect/tracks/{job_id}/{frame_idx} |
Per-frame track data |
| DELETE | /detect/job/{job_id} |
Cancel running job |
| POST | /detect |
Synchronous detection (returns MP4 directly) |
| POST | /benchmark |
GSAM2 latency breakdown |
| POST | /benchmark/profile |
Per-frame timing breakdown |
| POST | /benchmark/analysis |
Full roofline analysis |
/detect/async params: video, mode (object_detection/segmentation/drone_detection), queries, detector (default: yolo11), segmenter (default: GSAM2-L), enable_depth (default: false), step (default: 7, segmentation keyframe interval).
Multi-GPU Inference Pipeline (inference.py)
run_inference() β Detection and drone modes:
AsyncVideoReaderprefetches frames into a queue (up to 32 frames)- Models loaded in parallel via
ThreadPoolExecutor(one detector per GPU) - Queue-based producer/consumer: main thread feeds
queue_in, N GPU workers drain it - Workers batch frames (up to
max_batch_size=32for YOLO) under per-modelRLock - Writer thread reorders frames, runs
ByteTracker+SpeedEstimator, writes viaStreamingVideoWriter, publishes to MJPEG stream - Cancellation: workers poll
_check_cancellation(job_id)each cycle
run_grounded_sam2_tracking() β Segmentation mode:
- Extracts all frames to JPEG files on disk
- Runs detection on keyframes (every
stepframes) to seed SAM2 - SAM2 video predictor propagates masks between keyframes
- ID reconciliation via IoU matching in
MaskDictionary - Renders colored semi-transparent mask overlays with contours
Jobs System (jobs/)
models.pyβJobInfodataclass +JobStatusenum (PROCESSING/COMPLETED/FAILED/CANCELLED)storage.pyβ In-memoryJobStorage(singleton,RLock-protected) + disk at/tmp/detection_jobs/{job_id}/. Per-frame track data stored here. Auto-cleanup every 10 min (1hr expiry).background.pyβprocess_video_async()coroutine dispatches to the right inference functionstreaming.pyβ MJPEG frame queue +asyncio.Eventpublisher;publish_frame()resizes to 640px
Frontend (demo/)
Single-page command center UI served at / (mounted at /demo). No build step. Uses window.ISR global namespace.
Key scripts:
init.jsβ bootstrapswindow.ISR, wires UI, initializes state machinestate-machine.jsβ explicit FSM for UI flow (idle β detecting β playing β inspect)api.jsβ all backend API calls (startDetection,fetchTracks,fetchPointCloud, etc.)real-backend.jsβ streaming + polling + prefetch logic for live detection jobsinspect.jsβ 4-quadrant inspection panel (seg, edge, depth, 3D) with Tripo3D supportrender.jsβ canvas overlays for bounding boxes and tracksui.jsβ panel layout, drawer tabs, command baranalysis.jsβ track analysis and timeline renderinghelpers.jsβ viridis colormap, Sobel filter, RLE decode, utility functions
The frontend infers mode from the detector select element's data-kind attribute.
Models
Detectors (models/detectors/)
| Key | Class | Type | Batch | Notes |
|---|---|---|---|---|
yolo11 |
Yolo11Detector |
COCO closed-set | Yes (32) | Default. Tiling for large frames. |
detr_resnet50 |
DetrDetector |
COCO closed-set | No | HF transformers pipeline |
grounding_dino |
GroundingDinoDetector |
Open-vocabulary | No | Text-query grounded detection |
yolov8_visdrone |
YoloV8VisDroneDetector |
VisDrone aerial | Yes (32) | ensure_weights() for safe parallel init |
All implement ObjectDetector.predict(frame, queries) β DetectionResult(boxes, scores, labels, label_names).
Registered in models/model_loader.py. Cached via @lru_cache for single-GPU; load_detector_on_device(name, device) for multi-GPU (uncached). Call prefetch_weights(name) before parallel GPU init to avoid download race conditions.
Segmenters (models/segmenters/)
| Key | Detector | SAM2 Size |
|---|---|---|
GSAM2-S/B/L |
Grounding DINO | small/base/large |
YSAM2-S/B/L |
YOLO11 | small/base/large |
Default: GSAM2-L. Registered in models/segmenters/model_loader.py.
Depth Estimators (models/depth_estimators/)
Single entry: key depth β DepthAnythingV2Estimator. Optional, enabled via enable_depth=True.
Adding New Detectors
- Create class in
models/detectors/implementingObjectDetector.predict()βDetectionResult - If weights need downloading, add
ensure_weights()classmethod for thread-safe prefetch - Register in
models/model_loader.py_REGISTRY - Add
<option>todemo/index.html#detectorSelectwith appropriatedata-kind
Key Patterns
- Weight downloads: Use
ensure_weights()classmethod +prefetch_weights()in inference.py beforeThreadPoolExecutorto avoid race conditions (seeyolov8_visdrone.py) - Per-model locking: Each detector/depth instance gets a
threading.RLockfor thread-safepredict()calls in multi-GPU workers - Frame reordering: Writer thread uses a reorder buffer (128 frames) since GPU workers finish out-of-order
- MJPEG streaming:
publish_frame()drops frames if queue full (backpressure), consumer is event-driven at ~30fps - Job file layout:
/tmp/detection_jobs/{job_id}/βinput.mp4,output.mp4,depth.mp4
Parallel Execution with Team Mode
When implementing features that touch independent subsystems, use team mode (parallel agents with worktree isolation) for maximum efficiency.
When to Parallelize
- Backend (Python) + Frontend (JS) changes β always parallelizable
- Independent API endpoints or UI components
- Any 2+ tasks that don't modify the same files
How to Parallelize
- Identify independent task domains (e.g., backend vs frontend)
- Dispatch one agent per domain using
isolation: "worktree" - Each agent works in its own git worktree β no conflicts
- Merge results back:
git checkout <worktree-branch> -- <files>
Default to parallel when tasks are independent. Sequential only when one task's output is the other's input.
Planning & Design Documents
- Plan and design docs (
docs/plans/) are temporary working artifacts only - Do NOT commit them to git
- Delete them after implementation is complete
- Use them during planning/brainstorming, then discard