| --- |
| title: Gridlock Traffic Violation API |
| emoji: π¦ |
| colorFrom: red |
| colorTo: blue |
| sdk: gradio |
| python_version: "3.10" |
| app_file: app.py |
| pinned: false |
| --- |
| # AID 728 β Traffic Rule Violation Detection |
|
|
| **IIIT Bangalore** |
|
|
| Detects traffic rule violations involving two-wheelers from single RGB street-camera images. Identifies **helmet violations**, **over-riding (>2 riders on one bike)**, and extracts the **license plate text** of every violating vehicle. |
|
|
| --- |
|
|
| ## Submission Files |
|
|
| ``` |
| final_submission/ |
| βββ solution.py # Core detection pipeline (TrafficViolationDetector class) |
| βββ requirements.txt # All Python dependencies |
| βββ README.md # This file |
| βββ models/ # All model weights (bundled, fully offline) |
| βββ yolov8s.pt # COCO primary detector (21.54 MB) |
| βββ stage1_best.pt # Custom two-wheeler detector (21.49 MB) |
| βββ helmet_v11.pt # Helmet classifier (5.22 MB) |
| βββ license.pt # License plate localiser (42.77 MB) |
| βββ FSRCNN_x3.pb # Super-resolution for plates (0.04 MB) |
| βββ depth_anything_v2/ # Depth-Anything V2 Small (HF) (47.31 MB fp16) |
| βββ paddleocr/ # Bundled PaddleOCR models |
| βββ official_models/ |
| βββ ... |
| |
| The pipeline also uses the `inference_sdk` to query the Roboflow API for: |
| - **Wrong-way driving detection** (`wrong-way-driving-detection-gqdmg/1`) |
| - **Seatbelt classification** (`seat-belt-detection-udcfg/5`) |
| |
| Total model size: 194.59 MB (limit: 250 MB) |
| ``` |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ### Install dependencies |
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Run inference |
| ```python |
| from solution import TrafficViolationDetector |
| |
| detector = TrafficViolationDetector(model_dir="./models") |
| result = detector.predict("path/to/image.jpg") |
| print(result) |
| ``` |
|
|
| ### Output format |
| ```json |
| { |
| "violations": [ |
| { |
| "vehicle_type": "two_wheeler", |
| "num_riders": 2, |
| "helmet_violations": 1, |
| "wrong_way": false, |
| "license_plate": "DL 7S AF 8144" |
| }, |
| { |
| "vehicle_type": "four_wheeler", |
| "seatbelt_violations": 1, |
| "wrong_way": true, |
| "license_plate": "MH 12 AB 1234" |
| } |
| ] |
| } |
| ``` |
|
|
| - One entry per **violating** two-wheeler only |
| - `violations` is an empty list `[]` if no violations are found |
| - `license_plate` is `"UNKNOWN"` when the plate cannot be read |
| - `num_riders` counts riders per bike; `helmet_violations` counts those without a helmet |
|
|
| --- |
|
|
| ## Pipeline Architecture |
|
|
| The pipeline runs in 7 sequential stages per image: |
|
|
| ``` |
| Input Image |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β Stage 1 β Primary Detection (yolov8s.pt, COCO) β |
| β Detects: persons (cls 0), motorcycles (cls 3) β |
| ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β Stage 2 β Supplemental Bike Detection β |
| β (stage1_best.pt β custom trained) β |
| β Merged with Stage 1 bikes via NMS β |
| ββββββββββββββββββββββ¬βββββββββββββββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β Stage 3 β Monocular Depth Estimation β |
| β (Depth-Anything V2 Small, fp16 stored) β |
| β Produces normalised depth map [0,1] β |
| ββββββββββββββββββββββ¬βββββββββββββββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β Stage 4 β Person β Bike Association β |
| β Criteria: IoU overlap + column align β |
| β + depth proximity check β |
| ββββββββββββββββββββββ¬βββββββββββββββββββββ |
| β |
| ββββββββββββΌβββββββββββ |
| β Per-bike loop β |
| ββββββββββββ¬βββββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β Stage 5 β Helmet Classification β |
| β (helmet_v11.pt β YOLOv11 custom) β |
| β Crops top 45% of each rider bbox β |
| β (head region), runs cls 0=helmet β |
| ββββββββββββββββββββββ¬βββββββββββββββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β Stage 6 β Wrong Way Detection (API) β |
| β (wrong-way-driving-detection-gqdmg/1) β |
| β Flags vehicle bounding boxes that β |
| β overlap with 'wrong-side' detections β |
| ββββββββββββββββββββββ¬βββββββββββββββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β Stage 7 β Seatbelt Detection (API) β |
| β (seat-belt-detection-udcfg/5) β |
| β Runs only on four-wheeler crops β |
| ββββββββββββββββββββββ¬βββββββββββββββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β Stage 8 β License Plate Localisation β |
| β (license.pt β YOLO custom) β |
| β Runs on violating vehicles β |
| ββββββββββββββββββββββ¬βββββββββββββββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β Stage 9 β OCR (PaddleOCR 3.5.0) β |
| β FSRCNN x3 super-resolution β CLAHE β |
| β sharpening β PP-OCRv5 mobile det+rec β |
| β Text cleaned: uppercase alphanumeric β |
| ββββββββββββββββββββββ¬βββββββββββββββββββββ |
| β |
| βΌ |
| Output: violations list |
| ``` |
|
|
| ### Violation Logic |
| - A bike is flagged as a **violation** if: |
| - `num_riders >= 3` (over-riding), **OR** |
| - `helmet_violations > 0` (at least one rider without a helmet) |
| - Only violating bikes appear in the output list |
|
|
| --- |
|
|
| ## Model Details |
|
|
| ### `yolov8s.pt` β COCO Primary Detector |
| - **Type**: YOLOv8 Small, pretrained on COCO |
| - **Used for**: Detecting persons (class 0) and motorcycles (class 3) |
| - **Confidence**: 0.30, IoU: 0.45 |
|
|
| ### `stage1_best.pt` β Custom Two-Wheeler Detector |
| - **Type**: YOLOv8-based, custom trained |
| - **Used for**: Supplementing COCO detections with domain-specific two-wheeler types (scooters, three-wheelers, etc. that COCO misses) |
| - **Merge**: Combined with COCO bike boxes via IoU-based NMS (threshold 0.45) |
| - **Augmented inference** (`augment=True`) for improved recall |
| |
| ### `depth_anything_v2/` β Monocular Depth Estimation |
| - **Type**: Depth-Anything V2 Small (Hugging Face Transformers) |
| - **Used for**: Filtering out background pedestrians that share column overlap with a detected bike but are at a different depth plane |
| - **Storage**: fp16 safetensors on disk (47.3 MB vs 94.6 MB fp32) β loaded as fp32 at runtime for CPU inference speed |
| - **Output**: Normalised depth map [0, 1] resized to match the input image |
| |
| ### `helmet_v11.pt` β Helmet Classifier |
| - **Type**: YOLOv11-based, custom trained on merged dataset |
| - **Training data**: 4 merged Kaggle datasets (andrewmvd, aneesarom, roboflow Γ2) β all remapped to 2 classes: `with_helmet (0)`, `without_helmet (1)` |
| - **Input**: Top 45% of each rider bounding box (head crop) with 5% lateral padding |
| - **Confidence**: 0.25 |
|
|
| ### `license.pt` β License Plate Localiser |
| - **Type**: YOLO custom, trained on Indian license plates |
| - **Used for**: Detecting the tight bounding box of the license plate within a bike crop |
| - **Confidence**: 0.20 (low threshold to catch partially visible plates) |
|
|
| ### `FSRCNN_x3.pb` β Super-Resolution |
| - **Type**: FSRCNN (Fast Super-Resolution CNN), Γ3 scale, TensorFlow/OpenCV DNN |
| - **Used for**: Upscaling small plate crops (often <100px tall) 3Γ before OCR to improve recognition accuracy |
| |
| ### `paddleocr/` β OCR Engine (PaddleOCR 3.5.0) |
| - **Detection**: `PP-OCRv5_mobile_det` (4.7 MB) β finds text line bounding boxes within the plate crop |
| - **Recognition**: `en_PP-OCRv5_mobile_rec` (7.6 MB) β reads each text line |
| - **Orientation models**: `PP-LCNet_x1_0_doc_ori`, `PP-LCNet_x1_0_textline_ori` β handle rotated plates |
| - **Unwarping**: `UVDoc` β corrects perspective distortion |
| - **API**: Uses the legacy `.ocr()` method (not `.predict()`). Both call the same underlying pipeline, but `.ocr()` uses a compatible inference backend on Windows/Linux CPU without triggering the OneDNN fused_conv2d operator crash present in the newer `.predict()` path |
| - **Post-processing**: Text is uppercased, non-alphanumeric characters stripped, tokens shorter than 2 characters discarded |
| |
| --- |
| |
| ## Offline Operation |
| |
| All model weights are bundled in `./models/`. No internet connection is required at runtime. |
| |
| PaddleOCR 3.5.0 uses [paddlex](https://github.com/PaddlePaddle/PaddleX) internally and looks for models via the `PADDLE_PDX_CACHE_HOME` environment variable. `solution.py` sets this variable to `./models/paddleocr/` **before** any paddle import, so paddlex resolves all models from the bundled path: |
|
|
| ```python |
| os.environ["PADDLE_PDX_CACHE_HOME"] = str(Path(__file__).parent / "models" / "paddleocr") |
| ``` |
|
|
| --- |
|
|
| ## Design Decisions |
|
|
| ### Why two bike detectors? |
| COCO's `motorcycle` class (cls 3) misses many Indian two-wheeler types. The custom `stage1_best.pt` trained on traffic footage recovers these. Boxes from both are merged via NMS. |
|
|
| ### Why depth filtering? |
| In busy street scenes, COCO frequently detects pedestrians on the footpath who share horizontal overlap with a detected bike. Depth-Anything V2 provides a proxy for Z-distance; persons whose median depth differs from the bike's median depth by more than 35% are excluded from association. |
|
|
| ### Why not use PaddleOCR's server detection model? |
| `PP-OCRv5_server_det` is 84.3 MB β bundling it would push the total over 250 MB. Instead, `license.pt` performs the coarse plate localisation (narrowing the search area to ~125Γ90 px), then `PP-OCRv5_mobile_det` (4.7 MB) finds individual text lines within that small crop, and `en_PP-OCRv5_mobile_rec` reads them. This two-stage localisation gives equivalent quality at a fraction of the size. |
|
|
| ### Why store depth model as fp16? |
| `model.safetensors` converted from fp32 (94.6 MB) to fp16 (47.3 MB) at submission time using `safetensors.torch`. At runtime the model is loaded as fp32 (`dtype=torch.float32`) because x86 CPUs have no native fp16 compute units β running fp16 tensors on CPU causes a 10Γ slowdown. The disk saving is free; the compute cost is zero. |
|
|
| ### Fallback for missing riders |
| If no COCO person is associated with a detected bike (e.g., very small image, occluded rider), one rider with no helmet is assumed. This is a conservative choice β it risks a false positive but never misses a genuine violation. |
|
|
| --- |
|
|
| ## Constraints Compliance |
|
|
| | Constraint | Status | |
| |---|---| |
| | Model size β€ 250 MB | β
194.6 MB | |
| | No VLMs > 1B parameters | β
Largest model is Depth-Anything V2 Small (~24M params) | |
| | Fully offline execution | β
All weights in `./models/`, `PADDLE_PDX_CACHE_HOME` redirected | |
| | `TrafficViolationDetector` interface | β
`__init__(model_dir)` + `predict(image_path) β dict` | |
| | Stateless `predict()` | β
No mutable shared state between calls | |
| | Error handling | β
All exceptions caught; returns `{"violations": []}` on failure | |
|
|
| --- |
|
|
| ## Performance (Local Windows CPU) |
|
|
| | Metric | Value | |
| |---|---| |
| | Init time (cold start) | ~3β4 s | |
| | Inference β simple scene (1β2 bikes) | ~4β5 s | |
| | Inference β dense scene (8+ bikes) | ~10β12 s | |
|
|
| > **Note**: The evaluation server runs Linux with a faster CPU; inference times are expected to be lower. Depth estimation (Depth-Anything V2) is the primary bottleneck on CPU. |
|
|
|
|