ISR

Sleeping

App Files Files Community

ISR / CLAUDE.md

Zhen Ye

refactor: migrate to uv, prompt-tune BAML mission planner and assessor

18a11bc 15 days ago

preview code

raw

history blame contribute delete

8.49 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Multi-GPU video analysis platform with three fully functional modes:

Object Detection: Bounding boxes via YOLO11, DETR, or Grounding DINO
Segmentation: Mask overlays via Grounded SAM2 (GSAM2) or YOLO+SAM2 (YSAM2)
Drone Detection: Aerial object detection via YOLOv8 fine-tuned on VisDrone

Deployed as a HuggingFace Space (Docker SDK) at https://biaslab2025-isr.hf.space.

Development Commands

# Setup
uv sync

# Run dev server
uvicorn app:app --host 0.0.0.0 --port 7860 --reload

# Verify imports (quick smoke test — no tests exist yet)
python -c "from app import app"

# Docker
docker build -t isr . && docker run -p 7860:7860 isr

# Test async detection
curl -X POST http://localhost:7860/detect/async \
  -F "video=@sample.mp4" \
  -F "mode=object_detection" \
  -F "queries=person,car" \
  -F "detector=yolo11"

Core Architecture

Async Detection Flow (primary path)

Frontend (index.html) → POST /detect/async → background task → MJPEG stream + polling

Frontend uploads video + mode + queries to /detect/async
Backend creates a JobInfo, spawns process_video_async() as an asyncio.Task
inference.py runs multi-GPU parallel inference, publishing frames to an MJPEG stream
Frontend consumes /detect/stream/{job_id} for live video, polls /detect/status/{job_id}
On completion, frontend fetches final video from /detect/video/{job_id}

API Endpoints (app.py)

Method	Path	Purpose
POST	`/detect/async`	Start async job (returns `job_id` + stream/status URLs)
GET	`/detect/status/{job_id}`	Poll job status
GET	`/detect/stream/{job_id}`	MJPEG live stream (event-driven, 640px wide)
GET	`/detect/video/{job_id}`	Download processed MP4
GET	`/detect/depth-video/{job_id}`	Download depth video
GET	`/detect/tracks/{job_id}/summary`	Per-frame detection counts (timeline heatmap)
GET	`/detect/tracks/{job_id}/{frame_idx}`	Per-frame track data
DELETE	`/detect/job/{job_id}`	Cancel running job
POST	`/detect`	Synchronous detection (returns MP4 directly)
POST	`/benchmark`	GSAM2 latency breakdown
POST	`/benchmark/profile`	Per-frame timing breakdown
POST	`/benchmark/analysis`	Full roofline analysis

/detect/async params: video, mode (object_detection/segmentation/drone_detection), queries, detector (default: yolo11), segmenter (default: GSAM2-L), enable_depth (default: false), step (default: 7, segmentation keyframe interval).

Multi-GPU Inference Pipeline (inference.py)

run_inference() — Detection and drone modes:

AsyncVideoReader prefetches frames into a queue (up to 32 frames)
Models loaded in parallel via ThreadPoolExecutor (one detector per GPU)
Queue-based producer/consumer: main thread feeds queue_in, N GPU workers drain it
Workers batch frames (up to max_batch_size=32 for YOLO) under per-model RLock
Writer thread reorders frames, runs ByteTracker + SpeedEstimator, writes via StreamingVideoWriter, publishes to MJPEG stream
Cancellation: workers poll _check_cancellation(job_id) each cycle

run_grounded_sam2_tracking() — Segmentation mode:

Extracts all frames to JPEG files on disk
Runs detection on keyframes (every step frames) to seed SAM2
SAM2 video predictor propagates masks between keyframes
ID reconciliation via IoU matching in MaskDictionary
Renders colored semi-transparent mask overlays with contours

Jobs System (jobs/)

models.py — JobInfo dataclass + JobStatus enum (PROCESSING/COMPLETED/FAILED/CANCELLED)
storage.py — In-memory JobStorage (singleton, RLock-protected) + disk at /tmp/detection_jobs/{job_id}/. Per-frame track data stored here. Auto-cleanup every 10 min (1hr expiry).
background.py — process_video_async() coroutine dispatches to the right inference function
streaming.py — MJPEG frame queue + asyncio.Event publisher; publish_frame() resizes to 640px

Frontend (demo/)

Single-page command center UI served at / (mounted at /demo). No build step. Uses window.ISR global namespace.

Key scripts:

init.js → bootstraps window.ISR, wires UI, initializes state machine
state-machine.js → explicit FSM for UI flow (idle → detecting → playing → inspect)
api.js → all backend API calls (startDetection, fetchTracks, fetchPointCloud, etc.)
real-backend.js → streaming + polling + prefetch logic for live detection jobs
inspect.js → 4-quadrant inspection panel (seg, edge, depth, 3D) with Tripo3D support
render.js → canvas overlays for bounding boxes and tracks
ui.js → panel layout, drawer tabs, command bar
analysis.js → track analysis and timeline rendering
helpers.js → viridis colormap, Sobel filter, RLE decode, utility functions

The frontend infers mode from the detector select element's data-kind attribute.

Models

Detectors (models/detectors/)

Key	Class	Type	Batch	Notes
`yolo11`	`Yolo11Detector`	COCO closed-set	Yes (32)	Default. Tiling for large frames.
`detr_resnet50`	`DetrDetector`	COCO closed-set	No	HF transformers pipeline
`grounding_dino`	`GroundingDinoDetector`	Open-vocabulary	No	Text-query grounded detection
`yolov8_visdrone`	`YoloV8VisDroneDetector`	VisDrone aerial	Yes (32)	`ensure_weights()` for safe parallel init

All implement ObjectDetector.predict(frame, queries) → DetectionResult(boxes, scores, labels, label_names).

Registered in models/model_loader.py. Cached via @lru_cache for single-GPU; load_detector_on_device(name, device) for multi-GPU (uncached). Call prefetch_weights(name) before parallel GPU init to avoid download race conditions.

Segmenters (models/segmenters/)

Key	Detector	SAM2 Size
`GSAM2-S/B/L`	Grounding DINO	small/base/large
`YSAM2-S/B/L`	YOLO11	small/base/large

Default: GSAM2-L. Registered in models/segmenters/model_loader.py.

Depth Estimators (models/depth_estimators/)

Single entry: key depth → DepthAnythingV2Estimator. Optional, enabled via enable_depth=True.

Adding New Detectors

Create class in models/detectors/ implementing ObjectDetector.predict() → DetectionResult
If weights need downloading, add ensure_weights() classmethod for thread-safe prefetch
Register in models/model_loader.py _REGISTRY
Add <option> to demo/index.html #detectorSelect with appropriate data-kind

Key Patterns

Weight downloads: Use ensure_weights() classmethod + prefetch_weights() in inference.py before ThreadPoolExecutor to avoid race conditions (see yolov8_visdrone.py)
Per-model locking: Each detector/depth instance gets a threading.RLock for thread-safe predict() calls in multi-GPU workers
Frame reordering: Writer thread uses a reorder buffer (128 frames) since GPU workers finish out-of-order
MJPEG streaming: publish_frame() drops frames if queue full (backpressure), consumer is event-driven at ~30fps
Job file layout: /tmp/detection_jobs/{job_id}/ → input.mp4, output.mp4, depth.mp4

Parallel Execution with Team Mode

When implementing features that touch independent subsystems, use team mode (parallel agents with worktree isolation) for maximum efficiency.

When to Parallelize

Backend (Python) + Frontend (JS) changes — always parallelizable
Independent API endpoints or UI components
Any 2+ tasks that don't modify the same files

How to Parallelize

Identify independent task domains (e.g., backend vs frontend)
Dispatch one agent per domain using isolation: "worktree"
Each agent works in its own git worktree — no conflicts
Merge results back: git checkout <worktree-branch> -- <files>

Default to parallel when tasks are independent. Sequential only when one task's output is the other's input.

Planning & Design Documents

Plan and design docs (docs/plans/) are temporary working artifacts only
Do NOT commit them to git
Delete them after implementation is complete
Use them during planning/brainstorming, then discard