Spaces:

BiasLab2025
/

detection_base

Paused

App Files Files Community

detection_base / CLAUDE.md

Zhen Ye

chore: clean up dead code, stale comments, and misleading names

2e2a601 4 days ago

preview code

raw

history blame contribute delete

4.89 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Reusable video analysis base combining object detection, segmentation, depth estimation, and multi-object tracking. Deployed as a Hugging Face Space (Docker SDK). Designed for multi-GPU inference with async job processing and live MJPEG streaming.

Development Commands

# Setup
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Run dev server
uvicorn app:app --host 0.0.0.0 --port 7860 --reload

# Docker (production / HF Spaces)
docker build -t detection_base . && docker run -p 7860:7860 detection_base

# Test async detection
curl -X POST http://localhost:7860/detect/async \
  -F "video=@sample.mp4" \
  -F "mode=object_detection" \
  -F "queries=person,car" \
  -F "detector=yolo11"

No test suite exists. Verify changes by running the server and testing through the UI at http://localhost:7860.

Architecture

Request Flow

index.html → POST /detect/async → app.py
  ├─ process_first_frame()           # Fast preview (~1-2s)
  ├─ Return job_id + URLs immediately
  └─ Background: process_video_async()
      ├─ run_inference()                 # Detection mode
      └─ run_grounded_sam2_tracking()    # Segmentation mode

The async pipeline returns instantly with a job_id. The frontend polls /detect/status/{job_id} and streams live frames via /detect/stream/{job_id} (MJPEG).

API Endpoints (app.py)

Core: POST /detect (sync), POST /detect/async (async with streaming) Job management: GET /detect/status/{job_id}, DELETE /detect/job/{job_id}, GET /detect/video/{job_id}, GET /detect/stream/{job_id} Per-frame data: GET /detect/tracks/{job_id}/{frame_idx}, GET /detect/first-frame/{job_id}, GET /detect/first-frame-depth/{job_id}, GET /detect/depth-video/{job_id} Benchmarking: POST /benchmark, POST /benchmark/profile, POST /benchmark/analysis, GET /gpu-monitor, GET /benchmark/hardware

Model Registries

All models use a registry + factory pattern with @lru_cache for singleton loading. Use load_*_on_device(name, device) for multi-GPU (no cache).

Detectors (models/model_loader.py):

Key	Model	Vocabulary
`yolo11` (default)	YOLO11m	COCO classes only
`detr_resnet50`	DETR	COCO classes only
`grounding_dino`	Grounding DINO	Open-vocabulary (arbitrary text)
`drone_yolo`	Drone YOLO	Specialized UAV detection

All implement ObjectDetector.predict(frame, queries) → DetectionResult(boxes, scores, labels, label_names) from models/detectors/base.py.

Segmenters (models/segmenters/model_loader.py):

GSAM2-S/B/L — Grounded SAM2 (small/base/large) backed by grounding_dino
YSAM2-S/B/L — YOLO-SAM2 (small/base/large) backed by yolo11

Depth (models/depth_estimators/model_loader.py):

depth — DepthAnythingV2

Inference Pipeline (inference.py)

Three public entry points:

process_first_frame() — Extract + detect on frame 0 only. Returns processed frame + detections.
run_inference() — Full detection pipeline. Multi-GPU data parallelism with worker threads per GPU, reorder buffer for out-of-order completion, ByteTracker for object tracking, optional depth.
run_grounded_sam2_tracking() — SAM2 segmentation with temporal coherence. Uses SharedFrameStore (in-memory decoded frames, 12 GiB budget) or falls back to JPEG extraction. step parameter controls keyframe interval.

Async Job System (jobs/)

jobs/models.py — JobInfo dataclass, JobStatus enum (PROCESSING/COMPLETED/FAILED/CANCELLED)
jobs/storage.py — Thread-safe in-memory storage at /tmp/detection_jobs/{job_id}/. Auto-cleanup every 10 minutes.
jobs/background.py — process_video_async() dispatches to the correct inference function, updates job status.
jobs/streaming.py — Event-driven MJPEG frame publishing. Non-blocking (drops if consumer is slow). Frames pre-resized to 640px width.

Concurrency Model

Per-model RLock for GPU serialization (inference.py:_get_model_lock)
Multi-GPU workers use separate model instances per device
AsyncVideoReader prefetches frames in a background thread to prevent GPU starvation

Frontend (index.html)

Single HTML page with vanilla JS. Upload video, pick mode/model, view first frame, live MJPEG stream, download processed video, inspect detection JSON.

Adding a New Detector

Create class in models/detectors/ implementing ObjectDetector from base.py
Register in models/model_loader.py _REGISTRY
Add option to detector dropdown in index.html

Dual Remotes

hf → Hugging Face Space (deployment)
github → GitHub (version control)