detection_base / CLAUDE.md
Zhen Ye
chore: clean up dead code, stale comments, and misleading names
2e2a601

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Reusable video analysis base combining object detection, segmentation, depth estimation, and multi-object tracking. Deployed as a Hugging Face Space (Docker SDK). Designed for multi-GPU inference with async job processing and live MJPEG streaming.

Development Commands

# Setup
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Run dev server
uvicorn app:app --host 0.0.0.0 --port 7860 --reload

# Docker (production / HF Spaces)
docker build -t detection_base . && docker run -p 7860:7860 detection_base

# Test async detection
curl -X POST http://localhost:7860/detect/async \
  -F "video=@sample.mp4" \
  -F "mode=object_detection" \
  -F "queries=person,car" \
  -F "detector=yolo11"

No test suite exists. Verify changes by running the server and testing through the UI at http://localhost:7860.

Architecture

Request Flow

index.html β†’ POST /detect/async β†’ app.py
  β”œβ”€ process_first_frame()           # Fast preview (~1-2s)
  β”œβ”€ Return job_id + URLs immediately
  └─ Background: process_video_async()
      β”œβ”€ run_inference()                 # Detection mode
      └─ run_grounded_sam2_tracking()    # Segmentation mode

The async pipeline returns instantly with a job_id. The frontend polls /detect/status/{job_id} and streams live frames via /detect/stream/{job_id} (MJPEG).

API Endpoints (app.py)

Core: POST /detect (sync), POST /detect/async (async with streaming) Job management: GET /detect/status/{job_id}, DELETE /detect/job/{job_id}, GET /detect/video/{job_id}, GET /detect/stream/{job_id} Per-frame data: GET /detect/tracks/{job_id}/{frame_idx}, GET /detect/first-frame/{job_id}, GET /detect/first-frame-depth/{job_id}, GET /detect/depth-video/{job_id} Benchmarking: POST /benchmark, POST /benchmark/profile, POST /benchmark/analysis, GET /gpu-monitor, GET /benchmark/hardware

Model Registries

All models use a registry + factory pattern with @lru_cache for singleton loading. Use load_*_on_device(name, device) for multi-GPU (no cache).

Detectors (models/model_loader.py):

Key Model Vocabulary
yolo11 (default) YOLO11m COCO classes only
detr_resnet50 DETR COCO classes only
grounding_dino Grounding DINO Open-vocabulary (arbitrary text)
drone_yolo Drone YOLO Specialized UAV detection

All implement ObjectDetector.predict(frame, queries) β†’ DetectionResult(boxes, scores, labels, label_names) from models/detectors/base.py.

Segmenters (models/segmenters/model_loader.py):

  • GSAM2-S/B/L β€” Grounded SAM2 (small/base/large) backed by grounding_dino
  • YSAM2-S/B/L β€” YOLO-SAM2 (small/base/large) backed by yolo11

Depth (models/depth_estimators/model_loader.py):

  • depth β€” DepthAnythingV2

Inference Pipeline (inference.py)

Three public entry points:

  • process_first_frame() β€” Extract + detect on frame 0 only. Returns processed frame + detections.
  • run_inference() β€” Full detection pipeline. Multi-GPU data parallelism with worker threads per GPU, reorder buffer for out-of-order completion, ByteTracker for object tracking, optional depth.
  • run_grounded_sam2_tracking() β€” SAM2 segmentation with temporal coherence. Uses SharedFrameStore (in-memory decoded frames, 12 GiB budget) or falls back to JPEG extraction. step parameter controls keyframe interval.

Async Job System (jobs/)

  • jobs/models.py β€” JobInfo dataclass, JobStatus enum (PROCESSING/COMPLETED/FAILED/CANCELLED)
  • jobs/storage.py β€” Thread-safe in-memory storage at /tmp/detection_jobs/{job_id}/. Auto-cleanup every 10 minutes.
  • jobs/background.py β€” process_video_async() dispatches to the correct inference function, updates job status.
  • jobs/streaming.py β€” Event-driven MJPEG frame publishing. Non-blocking (drops if consumer is slow). Frames pre-resized to 640px width.

Concurrency Model

  • Per-model RLock for GPU serialization (inference.py:_get_model_lock)
  • Multi-GPU workers use separate model instances per device
  • AsyncVideoReader prefetches frames in a background thread to prevent GPU starvation

Frontend (index.html)

Single HTML page with vanilla JS. Upload video, pick mode/model, view first frame, live MJPEG stream, download processed video, inspect detection JSON.

Adding a New Detector

  1. Create class in models/detectors/ implementing ObjectDetector from base.py
  2. Register in models/model_loader.py _REGISTRY
  3. Add option to detector dropdown in index.html

Dual Remotes

  • hf β†’ Hugging Face Space (deployment)
  • github β†’ GitHub (version control)