Spaces:

Devam0
/

gridlock

Sleeping

App Files Files Community

gridlock / README.md

Devam0

Replace inference-sdk with requests, pin Python 3.10

05078f2 8 days ago

preview code

Raw

History Blame Contribute Delete

13.5 kB

	---
	title: Gridlock Traffic Violation API
	emoji: 🚦
	colorFrom: red
	colorTo: blue
	sdk: gradio
	python_version: "3.10"
	app_file: app.py
	pinned: false
	---
	# AID 728 — Traffic Rule Violation Detection

	IIIT Bangalore

	Detects traffic rule violations involving two-wheelers from single RGB street-camera images. Identifies helmet violations, over-riding (>2 riders on one bike), and extracts the license plate text of every violating vehicle.

	---

	## Submission Files

	```
	final_submission/
	├── solution.py # Core detection pipeline (TrafficViolationDetector class)
	├── requirements.txt # All Python dependencies
	├── README.md # This file
	└── models/ # All model weights (bundled, fully offline)
	├── yolov8s.pt # COCO primary detector (21.54 MB)
	├── stage1_best.pt # Custom two-wheeler detector (21.49 MB)
	├── helmet_v11.pt # Helmet classifier (5.22 MB)
	├── license.pt # License plate localiser (42.77 MB)
	├── FSRCNN_x3.pb # Super-resolution for plates (0.04 MB)
	├── depth_anything_v2/ # Depth-Anything V2 Small (HF) (47.31 MB fp16)
	└── paddleocr/ # Bundled PaddleOCR models
	└── official_models/
	└── ...

	The pipeline also uses the `inference_sdk` to query the Roboflow API for:
	- Wrong-way driving detection (`wrong-way-driving-detection-gqdmg/1`)
	- Seatbelt classification (`seat-belt-detection-udcfg/5`)

	Total model size: 194.59 MB (limit: 250 MB)
	```

	---

	## Quick Start

	### Install dependencies
	```bash
	pip install -r requirements.txt
	```

	### Run inference
	```python
	from solution import TrafficViolationDetector

	detector = TrafficViolationDetector(model_dir="./models")
	result = detector.predict("path/to/image.jpg")
	print(result)
	```

	### Output format
	```json
	{
	"violations": [
	{
	"vehicle_type": "two_wheeler",
	"num_riders": 2,
	"helmet_violations": 1,
	"wrong_way": false,
	"license_plate": "DL 7S AF 8144"
	},
	{
	"vehicle_type": "four_wheeler",
	"seatbelt_violations": 1,
	"wrong_way": true,
	"license_plate": "MH 12 AB 1234"
	}
	]
	}
	```

	- One entry per violating two-wheeler only
	- `violations` is an empty list `[]` if no violations are found
	- `license_plate` is `"UNKNOWN"` when the plate cannot be read
	- `num_riders` counts riders per bike; `helmet_violations` counts those without a helmet

	---

	## Pipeline Architecture

	The pipeline runs in 7 sequential stages per image:

	```
	Input Image
	│
	▼
	┌─────────────────────────────────────────────────────────────────┐
	│ Stage 1 — Primary Detection (yolov8s.pt, COCO) │
	│ Detects: persons (cls 0), motorcycles (cls 3) │
	└────────────────────────┬────────────────────────────────────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Stage 2 — Supplemental Bike Detection │
	│ (stage1_best.pt — custom trained) │
	│ Merged with Stage 1 bikes via NMS │
	└────────────────────┬────────────────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Stage 3 — Monocular Depth Estimation │
	│ (Depth-Anything V2 Small, fp16 stored) │
	│ Produces normalised depth map [0,1] │
	└────────────────────┬────────────────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Stage 4 — Person → Bike Association │
	│ Criteria: IoU overlap + column align │
	│ + depth proximity check │
	└────────────────────┬────────────────────┘
	│
	┌──────────▼──────────┐
	│ Per-bike loop │
	└──────────┬──────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Stage 5 — Helmet Classification │
	│ (helmet_v11.pt — YOLOv11 custom) │
	│ Crops top 45% of each rider bbox │
	│ (head region), runs cls 0=helmet │
	└────────────────────┬────────────────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Stage 6 — Wrong Way Detection (API) │
	│ (wrong-way-driving-detection-gqdmg/1) │
	│ Flags vehicle bounding boxes that │
	│ overlap with 'wrong-side' detections │
	└────────────────────┬────────────────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Stage 7 — Seatbelt Detection (API) │
	│ (seat-belt-detection-udcfg/5) │
	│ Runs only on four-wheeler crops │
	└────────────────────┬────────────────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Stage 8 — License Plate Localisation │
	│ (license.pt — YOLO custom) │
	│ Runs on violating vehicles │
	└────────────────────┬────────────────────┘
	│
	┌────────────────────▼────────────────────┐
	│ Stage 9 — OCR (PaddleOCR 3.5.0) │
	│ FSRCNN x3 super-resolution → CLAHE │
	│ sharpening → PP-OCRv5 mobile det+rec │
	│ Text cleaned: uppercase alphanumeric │
	└────────────────────┬────────────────────┘
	│
	▼
	Output: violations list
	```

	### Violation Logic
	- A bike is flagged as a violation if:
	- `num_riders >= 3` (over-riding), OR
	- `helmet_violations > 0` (at least one rider without a helmet)
	- Only violating bikes appear in the output list

	---

	## Model Details

	### `yolov8s.pt` — COCO Primary Detector
	- Type: YOLOv8 Small, pretrained on COCO
	- Used for: Detecting persons (class 0) and motorcycles (class 3)
	- Confidence: 0.30, IoU: 0.45

	### `stage1_best.pt` — Custom Two-Wheeler Detector
	- Type: YOLOv8-based, custom trained
	- Used for: Supplementing COCO detections with domain-specific two-wheeler types (scooters, three-wheelers, etc. that COCO misses)
	- Merge: Combined with COCO bike boxes via IoU-based NMS (threshold 0.45)
	- Augmented inference (`augment=True`) for improved recall

	### `depth_anything_v2/` — Monocular Depth Estimation
	- Type: Depth-Anything V2 Small (Hugging Face Transformers)
	- Used for: Filtering out background pedestrians that share column overlap with a detected bike but are at a different depth plane
	- Storage: fp16 safetensors on disk (47.3 MB vs 94.6 MB fp32) — loaded as fp32 at runtime for CPU inference speed
	- Output: Normalised depth map [0, 1] resized to match the input image

	### `helmet_v11.pt` — Helmet Classifier
	- Type: YOLOv11-based, custom trained on merged dataset
	- Training data: 4 merged Kaggle datasets (andrewmvd, aneesarom, roboflow ×2) — all remapped to 2 classes: `with_helmet (0)`, `without_helmet (1)`
	- Input: Top 45% of each rider bounding box (head crop) with 5% lateral padding
	- Confidence: 0.25

	### `license.pt` — License Plate Localiser
	- Type: YOLO custom, trained on Indian license plates
	- Used for: Detecting the tight bounding box of the license plate within a bike crop
	- Confidence: 0.20 (low threshold to catch partially visible plates)

	### `FSRCNN_x3.pb` — Super-Resolution
	- Type: FSRCNN (Fast Super-Resolution CNN), ×3 scale, TensorFlow/OpenCV DNN
	- Used for: Upscaling small plate crops (often <100px tall) 3× before OCR to improve recognition accuracy

	### `paddleocr/` — OCR Engine (PaddleOCR 3.5.0)
	- Detection: `PP-OCRv5_mobile_det` (4.7 MB) — finds text line bounding boxes within the plate crop
	- Recognition: `en_PP-OCRv5_mobile_rec` (7.6 MB) — reads each text line
	- Orientation models: `PP-LCNet_x1_0_doc_ori`, `PP-LCNet_x1_0_textline_ori` — handle rotated plates
	- Unwarping: `UVDoc` — corrects perspective distortion
	- API: Uses the legacy `.ocr()` method (not `.predict()`). Both call the same underlying pipeline, but `.ocr()` uses a compatible inference backend on Windows/Linux CPU without triggering the OneDNN fused_conv2d operator crash present in the newer `.predict()` path
	- Post-processing: Text is uppercased, non-alphanumeric characters stripped, tokens shorter than 2 characters discarded

	---

	## Offline Operation

	All model weights are bundled in `./models/`. No internet connection is required at runtime.

	PaddleOCR 3.5.0 uses [paddlex](https://github.com/PaddlePaddle/PaddleX) internally and looks for models via the `PADDLE_PDX_CACHE_HOME` environment variable. `solution.py` sets this variable to `./models/paddleocr/` before any paddle import, so paddlex resolves all models from the bundled path:

	```python
	os.environ["PADDLE_PDX_CACHE_HOME"] = str(Path(__file__).parent / "models" / "paddleocr")
	```

	---

	## Design Decisions

	### Why two bike detectors?
	COCO's `motorcycle` class (cls 3) misses many Indian two-wheeler types. The custom `stage1_best.pt` trained on traffic footage recovers these. Boxes from both are merged via NMS.

	### Why depth filtering?
	In busy street scenes, COCO frequently detects pedestrians on the footpath who share horizontal overlap with a detected bike. Depth-Anything V2 provides a proxy for Z-distance; persons whose median depth differs from the bike's median depth by more than 35% are excluded from association.

	### Why not use PaddleOCR's server detection model?
	`PP-OCRv5_server_det` is 84.3 MB — bundling it would push the total over 250 MB. Instead, `license.pt` performs the coarse plate localisation (narrowing the search area to ~125×90 px), then `PP-OCRv5_mobile_det` (4.7 MB) finds individual text lines within that small crop, and `en_PP-OCRv5_mobile_rec` reads them. This two-stage localisation gives equivalent quality at a fraction of the size.

	### Why store depth model as fp16?
	`model.safetensors` converted from fp32 (94.6 MB) to fp16 (47.3 MB) at submission time using `safetensors.torch`. At runtime the model is loaded as fp32 (`dtype=torch.float32`) because x86 CPUs have no native fp16 compute units — running fp16 tensors on CPU causes a 10× slowdown. The disk saving is free; the compute cost is zero.

	### Fallback for missing riders
	If no COCO person is associated with a detected bike (e.g., very small image, occluded rider), one rider with no helmet is assumed. This is a conservative choice — it risks a false positive but never misses a genuine violation.

	---

	## Constraints Compliance

	\| Constraint \| Status \|
	\|---\|---\|
	\| Model size ≤ 250 MB \| ✅ 194.6 MB \|
	\| No VLMs > 1B parameters \| ✅ Largest model is Depth-Anything V2 Small (~24M params) \|
	\| Fully offline execution \| ✅ All weights in `./models/`, `PADDLE_PDX_CACHE_HOME` redirected \|
	\| `TrafficViolationDetector` interface \| ✅ `__init__(model_dir)` + `predict(image_path) → dict` \|
	\| Stateless `predict()` \| ✅ No mutable shared state between calls \|
	\| Error handling \| ✅ All exceptions caught; returns `{"violations": []}` on failure \|

	---

	## Performance (Local Windows CPU)

	\| Metric \| Value \|
	\|---\|---\|
	\| Init time (cold start) \| ~3–4 s \|
	\| Inference — simple scene (1–2 bikes) \| ~4–5 s \|
	\| Inference — dense scene (8+ bikes) \| ~10–12 s \|

	> Note: The evaluation server runs Linux with a faster CPU; inference times are expected to be lower. Depth estimation (Depth-Anything V2) is the primary bottleneck on CPU.