--- title: Gridlock Traffic Violation API emoji: 🚦 colorFrom: red colorTo: blue sdk: gradio python_version: "3.10" app_file: app.py pinned: false --- # AID 728 — Traffic Rule Violation Detection **IIIT Bangalore** Detects traffic rule violations involving two-wheelers from single RGB street-camera images. Identifies **helmet violations**, **over-riding (>2 riders on one bike)**, and extracts the **license plate text** of every violating vehicle. --- ## Submission Files ``` final_submission/ ├── solution.py # Core detection pipeline (TrafficViolationDetector class) ├── requirements.txt # All Python dependencies ├── README.md # This file └── models/ # All model weights (bundled, fully offline) ├── yolov8s.pt # COCO primary detector (21.54 MB) ├── stage1_best.pt # Custom two-wheeler detector (21.49 MB) ├── helmet_v11.pt # Helmet classifier (5.22 MB) ├── license.pt # License plate localiser (42.77 MB) ├── FSRCNN_x3.pb # Super-resolution for plates (0.04 MB) ├── depth_anything_v2/ # Depth-Anything V2 Small (HF) (47.31 MB fp16) └── paddleocr/ # Bundled PaddleOCR models └── official_models/ └── ... The pipeline also uses the `inference_sdk` to query the Roboflow API for: - **Wrong-way driving detection** (`wrong-way-driving-detection-gqdmg/1`) - **Seatbelt classification** (`seat-belt-detection-udcfg/5`) Total model size: 194.59 MB (limit: 250 MB) ``` --- ## Quick Start ### Install dependencies ```bash pip install -r requirements.txt ``` ### Run inference ```python from solution import TrafficViolationDetector detector = TrafficViolationDetector(model_dir="./models") result = detector.predict("path/to/image.jpg") print(result) ``` ### Output format ```json { "violations": [ { "vehicle_type": "two_wheeler", "num_riders": 2, "helmet_violations": 1, "wrong_way": false, "license_plate": "DL 7S AF 8144" }, { "vehicle_type": "four_wheeler", "seatbelt_violations": 1, "wrong_way": true, "license_plate": "MH 12 AB 1234" } ] } ``` - One entry per **violating** two-wheeler only - `violations` is an empty list `[]` if no violations are found - `license_plate` is `"UNKNOWN"` when the plate cannot be read - `num_riders` counts riders per bike; `helmet_violations` counts those without a helmet --- ## Pipeline Architecture The pipeline runs in 7 sequential stages per image: ``` Input Image │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Stage 1 — Primary Detection (yolov8s.pt, COCO) │ │ Detects: persons (cls 0), motorcycles (cls 3) │ └────────────────────────┬────────────────────────────────────────┘ │ ┌────────────────────▼────────────────────┐ │ Stage 2 — Supplemental Bike Detection │ │ (stage1_best.pt — custom trained) │ │ Merged with Stage 1 bikes via NMS │ └────────────────────┬────────────────────┘ │ ┌────────────────────▼────────────────────┐ │ Stage 3 — Monocular Depth Estimation │ │ (Depth-Anything V2 Small, fp16 stored) │ │ Produces normalised depth map [0,1] │ └────────────────────┬────────────────────┘ │ ┌────────────────────▼────────────────────┐ │ Stage 4 — Person → Bike Association │ │ Criteria: IoU overlap + column align │ │ + depth proximity check │ └────────────────────┬────────────────────┘ │ ┌──────────▼──────────┐ │ Per-bike loop │ └──────────┬──────────┘ │ ┌────────────────────▼────────────────────┐ │ Stage 5 — Helmet Classification │ │ (helmet_v11.pt — YOLOv11 custom) │ │ Crops top 45% of each rider bbox │ │ (head region), runs cls 0=helmet │ └────────────────────┬────────────────────┘ │ ┌────────────────────▼────────────────────┐ │ Stage 6 — Wrong Way Detection (API) │ │ (wrong-way-driving-detection-gqdmg/1) │ │ Flags vehicle bounding boxes that │ │ overlap with 'wrong-side' detections │ └────────────────────┬────────────────────┘ │ ┌────────────────────▼────────────────────┐ │ Stage 7 — Seatbelt Detection (API) │ │ (seat-belt-detection-udcfg/5) │ │ Runs only on four-wheeler crops │ └────────────────────┬────────────────────┘ │ ┌────────────────────▼────────────────────┐ │ Stage 8 — License Plate Localisation │ │ (license.pt — YOLO custom) │ │ Runs on violating vehicles │ └────────────────────┬────────────────────┘ │ ┌────────────────────▼────────────────────┐ │ Stage 9 — OCR (PaddleOCR 3.5.0) │ │ FSRCNN x3 super-resolution → CLAHE │ │ sharpening → PP-OCRv5 mobile det+rec │ │ Text cleaned: uppercase alphanumeric │ └────────────────────┬────────────────────┘ │ ▼ Output: violations list ``` ### Violation Logic - A bike is flagged as a **violation** if: - `num_riders >= 3` (over-riding), **OR** - `helmet_violations > 0` (at least one rider without a helmet) - Only violating bikes appear in the output list --- ## Model Details ### `yolov8s.pt` — COCO Primary Detector - **Type**: YOLOv8 Small, pretrained on COCO - **Used for**: Detecting persons (class 0) and motorcycles (class 3) - **Confidence**: 0.30, IoU: 0.45 ### `stage1_best.pt` — Custom Two-Wheeler Detector - **Type**: YOLOv8-based, custom trained - **Used for**: Supplementing COCO detections with domain-specific two-wheeler types (scooters, three-wheelers, etc. that COCO misses) - **Merge**: Combined with COCO bike boxes via IoU-based NMS (threshold 0.45) - **Augmented inference** (`augment=True`) for improved recall ### `depth_anything_v2/` — Monocular Depth Estimation - **Type**: Depth-Anything V2 Small (Hugging Face Transformers) - **Used for**: Filtering out background pedestrians that share column overlap with a detected bike but are at a different depth plane - **Storage**: fp16 safetensors on disk (47.3 MB vs 94.6 MB fp32) — loaded as fp32 at runtime for CPU inference speed - **Output**: Normalised depth map [0, 1] resized to match the input image ### `helmet_v11.pt` — Helmet Classifier - **Type**: YOLOv11-based, custom trained on merged dataset - **Training data**: 4 merged Kaggle datasets (andrewmvd, aneesarom, roboflow ×2) — all remapped to 2 classes: `with_helmet (0)`, `without_helmet (1)` - **Input**: Top 45% of each rider bounding box (head crop) with 5% lateral padding - **Confidence**: 0.25 ### `license.pt` — License Plate Localiser - **Type**: YOLO custom, trained on Indian license plates - **Used for**: Detecting the tight bounding box of the license plate within a bike crop - **Confidence**: 0.20 (low threshold to catch partially visible plates) ### `FSRCNN_x3.pb` — Super-Resolution - **Type**: FSRCNN (Fast Super-Resolution CNN), ×3 scale, TensorFlow/OpenCV DNN - **Used for**: Upscaling small plate crops (often <100px tall) 3× before OCR to improve recognition accuracy ### `paddleocr/` — OCR Engine (PaddleOCR 3.5.0) - **Detection**: `PP-OCRv5_mobile_det` (4.7 MB) — finds text line bounding boxes within the plate crop - **Recognition**: `en_PP-OCRv5_mobile_rec` (7.6 MB) — reads each text line - **Orientation models**: `PP-LCNet_x1_0_doc_ori`, `PP-LCNet_x1_0_textline_ori` — handle rotated plates - **Unwarping**: `UVDoc` — corrects perspective distortion - **API**: Uses the legacy `.ocr()` method (not `.predict()`). Both call the same underlying pipeline, but `.ocr()` uses a compatible inference backend on Windows/Linux CPU without triggering the OneDNN fused_conv2d operator crash present in the newer `.predict()` path - **Post-processing**: Text is uppercased, non-alphanumeric characters stripped, tokens shorter than 2 characters discarded --- ## Offline Operation All model weights are bundled in `./models/`. No internet connection is required at runtime. PaddleOCR 3.5.0 uses [paddlex](https://github.com/PaddlePaddle/PaddleX) internally and looks for models via the `PADDLE_PDX_CACHE_HOME` environment variable. `solution.py` sets this variable to `./models/paddleocr/` **before** any paddle import, so paddlex resolves all models from the bundled path: ```python os.environ["PADDLE_PDX_CACHE_HOME"] = str(Path(__file__).parent / "models" / "paddleocr") ``` --- ## Design Decisions ### Why two bike detectors? COCO's `motorcycle` class (cls 3) misses many Indian two-wheeler types. The custom `stage1_best.pt` trained on traffic footage recovers these. Boxes from both are merged via NMS. ### Why depth filtering? In busy street scenes, COCO frequently detects pedestrians on the footpath who share horizontal overlap with a detected bike. Depth-Anything V2 provides a proxy for Z-distance; persons whose median depth differs from the bike's median depth by more than 35% are excluded from association. ### Why not use PaddleOCR's server detection model? `PP-OCRv5_server_det` is 84.3 MB — bundling it would push the total over 250 MB. Instead, `license.pt` performs the coarse plate localisation (narrowing the search area to ~125×90 px), then `PP-OCRv5_mobile_det` (4.7 MB) finds individual text lines within that small crop, and `en_PP-OCRv5_mobile_rec` reads them. This two-stage localisation gives equivalent quality at a fraction of the size. ### Why store depth model as fp16? `model.safetensors` converted from fp32 (94.6 MB) to fp16 (47.3 MB) at submission time using `safetensors.torch`. At runtime the model is loaded as fp32 (`dtype=torch.float32`) because x86 CPUs have no native fp16 compute units — running fp16 tensors on CPU causes a 10× slowdown. The disk saving is free; the compute cost is zero. ### Fallback for missing riders If no COCO person is associated with a detected bike (e.g., very small image, occluded rider), one rider with no helmet is assumed. This is a conservative choice — it risks a false positive but never misses a genuine violation. --- ## Constraints Compliance | Constraint | Status | |---|---| | Model size ≤ 250 MB | ✅ 194.6 MB | | No VLMs > 1B parameters | ✅ Largest model is Depth-Anything V2 Small (~24M params) | | Fully offline execution | ✅ All weights in `./models/`, `PADDLE_PDX_CACHE_HOME` redirected | | `TrafficViolationDetector` interface | ✅ `__init__(model_dir)` + `predict(image_path) → dict` | | Stateless `predict()` | ✅ No mutable shared state between calls | | Error handling | ✅ All exceptions caught; returns `{"violations": []}` on failure | --- ## Performance (Local Windows CPU) | Metric | Value | |---|---| | Init time (cold start) | ~3–4 s | | Inference — simple scene (1–2 bikes) | ~4–5 s | | Inference — dense scene (8+ bikes) | ~10–12 s | > **Note**: The evaluation server runs Linux with a faster CPU; inference times are expected to be lower. Depth estimation (Depth-Anything V2) is the primary bottleneck on CPU.