gridlock / README.md
Devam0's picture
Replace inference-sdk with requests, pin Python 3.10
05078f2
|
Raw
History Blame Contribute Delete
13.5 kB
---
title: Gridlock Traffic Violation API
emoji: 🚦
colorFrom: red
colorTo: blue
sdk: gradio
python_version: "3.10"
app_file: app.py
pinned: false
---
# AID 728 β€” Traffic Rule Violation Detection
**IIIT Bangalore**
Detects traffic rule violations involving two-wheelers from single RGB street-camera images. Identifies **helmet violations**, **over-riding (>2 riders on one bike)**, and extracts the **license plate text** of every violating vehicle.
---
## Submission Files
```
final_submission/
β”œβ”€β”€ solution.py # Core detection pipeline (TrafficViolationDetector class)
β”œβ”€β”€ requirements.txt # All Python dependencies
β”œβ”€β”€ README.md # This file
└── models/ # All model weights (bundled, fully offline)
β”œβ”€β”€ yolov8s.pt # COCO primary detector (21.54 MB)
β”œβ”€β”€ stage1_best.pt # Custom two-wheeler detector (21.49 MB)
β”œβ”€β”€ helmet_v11.pt # Helmet classifier (5.22 MB)
β”œβ”€β”€ license.pt # License plate localiser (42.77 MB)
β”œβ”€β”€ FSRCNN_x3.pb # Super-resolution for plates (0.04 MB)
β”œβ”€β”€ depth_anything_v2/ # Depth-Anything V2 Small (HF) (47.31 MB fp16)
└── paddleocr/ # Bundled PaddleOCR models
└── official_models/
└── ...
The pipeline also uses the `inference_sdk` to query the Roboflow API for:
- **Wrong-way driving detection** (`wrong-way-driving-detection-gqdmg/1`)
- **Seatbelt classification** (`seat-belt-detection-udcfg/5`)
Total model size: 194.59 MB (limit: 250 MB)
```
---
## Quick Start
### Install dependencies
```bash
pip install -r requirements.txt
```
### Run inference
```python
from solution import TrafficViolationDetector
detector = TrafficViolationDetector(model_dir="./models")
result = detector.predict("path/to/image.jpg")
print(result)
```
### Output format
```json
{
"violations": [
{
"vehicle_type": "two_wheeler",
"num_riders": 2,
"helmet_violations": 1,
"wrong_way": false,
"license_plate": "DL 7S AF 8144"
},
{
"vehicle_type": "four_wheeler",
"seatbelt_violations": 1,
"wrong_way": true,
"license_plate": "MH 12 AB 1234"
}
]
}
```
- One entry per **violating** two-wheeler only
- `violations` is an empty list `[]` if no violations are found
- `license_plate` is `"UNKNOWN"` when the plate cannot be read
- `num_riders` counts riders per bike; `helmet_violations` counts those without a helmet
---
## Pipeline Architecture
The pipeline runs in 7 sequential stages per image:
```
Input Image
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 1 β€” Primary Detection (yolov8s.pt, COCO) β”‚
β”‚ Detects: persons (cls 0), motorcycles (cls 3) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 2 β€” Supplemental Bike Detection β”‚
β”‚ (stage1_best.pt β€” custom trained) β”‚
β”‚ Merged with Stage 1 bikes via NMS β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 3 β€” Monocular Depth Estimation β”‚
β”‚ (Depth-Anything V2 Small, fp16 stored) β”‚
β”‚ Produces normalised depth map [0,1] β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 4 β€” Person β†’ Bike Association β”‚
β”‚ Criteria: IoU overlap + column align β”‚
β”‚ + depth proximity check β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Per-bike loop β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 5 β€” Helmet Classification β”‚
β”‚ (helmet_v11.pt β€” YOLOv11 custom) β”‚
β”‚ Crops top 45% of each rider bbox β”‚
β”‚ (head region), runs cls 0=helmet β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 6 β€” Wrong Way Detection (API) β”‚
β”‚ (wrong-way-driving-detection-gqdmg/1) β”‚
β”‚ Flags vehicle bounding boxes that β”‚
β”‚ overlap with 'wrong-side' detections β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 7 β€” Seatbelt Detection (API) β”‚
β”‚ (seat-belt-detection-udcfg/5) β”‚
β”‚ Runs only on four-wheeler crops β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 8 β€” License Plate Localisation β”‚
β”‚ (license.pt β€” YOLO custom) β”‚
β”‚ Runs on violating vehicles β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Stage 9 β€” OCR (PaddleOCR 3.5.0) β”‚
β”‚ FSRCNN x3 super-resolution β†’ CLAHE β”‚
β”‚ sharpening β†’ PP-OCRv5 mobile det+rec β”‚
β”‚ Text cleaned: uppercase alphanumeric β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
Output: violations list
```
### Violation Logic
- A bike is flagged as a **violation** if:
- `num_riders >= 3` (over-riding), **OR**
- `helmet_violations > 0` (at least one rider without a helmet)
- Only violating bikes appear in the output list
---
## Model Details
### `yolov8s.pt` β€” COCO Primary Detector
- **Type**: YOLOv8 Small, pretrained on COCO
- **Used for**: Detecting persons (class 0) and motorcycles (class 3)
- **Confidence**: 0.30, IoU: 0.45
### `stage1_best.pt` β€” Custom Two-Wheeler Detector
- **Type**: YOLOv8-based, custom trained
- **Used for**: Supplementing COCO detections with domain-specific two-wheeler types (scooters, three-wheelers, etc. that COCO misses)
- **Merge**: Combined with COCO bike boxes via IoU-based NMS (threshold 0.45)
- **Augmented inference** (`augment=True`) for improved recall
### `depth_anything_v2/` β€” Monocular Depth Estimation
- **Type**: Depth-Anything V2 Small (Hugging Face Transformers)
- **Used for**: Filtering out background pedestrians that share column overlap with a detected bike but are at a different depth plane
- **Storage**: fp16 safetensors on disk (47.3 MB vs 94.6 MB fp32) β€” loaded as fp32 at runtime for CPU inference speed
- **Output**: Normalised depth map [0, 1] resized to match the input image
### `helmet_v11.pt` β€” Helmet Classifier
- **Type**: YOLOv11-based, custom trained on merged dataset
- **Training data**: 4 merged Kaggle datasets (andrewmvd, aneesarom, roboflow Γ—2) β€” all remapped to 2 classes: `with_helmet (0)`, `without_helmet (1)`
- **Input**: Top 45% of each rider bounding box (head crop) with 5% lateral padding
- **Confidence**: 0.25
### `license.pt` β€” License Plate Localiser
- **Type**: YOLO custom, trained on Indian license plates
- **Used for**: Detecting the tight bounding box of the license plate within a bike crop
- **Confidence**: 0.20 (low threshold to catch partially visible plates)
### `FSRCNN_x3.pb` β€” Super-Resolution
- **Type**: FSRCNN (Fast Super-Resolution CNN), Γ—3 scale, TensorFlow/OpenCV DNN
- **Used for**: Upscaling small plate crops (often <100px tall) 3Γ— before OCR to improve recognition accuracy
### `paddleocr/` β€” OCR Engine (PaddleOCR 3.5.0)
- **Detection**: `PP-OCRv5_mobile_det` (4.7 MB) β€” finds text line bounding boxes within the plate crop
- **Recognition**: `en_PP-OCRv5_mobile_rec` (7.6 MB) β€” reads each text line
- **Orientation models**: `PP-LCNet_x1_0_doc_ori`, `PP-LCNet_x1_0_textline_ori` β€” handle rotated plates
- **Unwarping**: `UVDoc` β€” corrects perspective distortion
- **API**: Uses the legacy `.ocr()` method (not `.predict()`). Both call the same underlying pipeline, but `.ocr()` uses a compatible inference backend on Windows/Linux CPU without triggering the OneDNN fused_conv2d operator crash present in the newer `.predict()` path
- **Post-processing**: Text is uppercased, non-alphanumeric characters stripped, tokens shorter than 2 characters discarded
---
## Offline Operation
All model weights are bundled in `./models/`. No internet connection is required at runtime.
PaddleOCR 3.5.0 uses [paddlex](https://github.com/PaddlePaddle/PaddleX) internally and looks for models via the `PADDLE_PDX_CACHE_HOME` environment variable. `solution.py` sets this variable to `./models/paddleocr/` **before** any paddle import, so paddlex resolves all models from the bundled path:
```python
os.environ["PADDLE_PDX_CACHE_HOME"] = str(Path(__file__).parent / "models" / "paddleocr")
```
---
## Design Decisions
### Why two bike detectors?
COCO's `motorcycle` class (cls 3) misses many Indian two-wheeler types. The custom `stage1_best.pt` trained on traffic footage recovers these. Boxes from both are merged via NMS.
### Why depth filtering?
In busy street scenes, COCO frequently detects pedestrians on the footpath who share horizontal overlap with a detected bike. Depth-Anything V2 provides a proxy for Z-distance; persons whose median depth differs from the bike's median depth by more than 35% are excluded from association.
### Why not use PaddleOCR's server detection model?
`PP-OCRv5_server_det` is 84.3 MB β€” bundling it would push the total over 250 MB. Instead, `license.pt` performs the coarse plate localisation (narrowing the search area to ~125Γ—90 px), then `PP-OCRv5_mobile_det` (4.7 MB) finds individual text lines within that small crop, and `en_PP-OCRv5_mobile_rec` reads them. This two-stage localisation gives equivalent quality at a fraction of the size.
### Why store depth model as fp16?
`model.safetensors` converted from fp32 (94.6 MB) to fp16 (47.3 MB) at submission time using `safetensors.torch`. At runtime the model is loaded as fp32 (`dtype=torch.float32`) because x86 CPUs have no native fp16 compute units β€” running fp16 tensors on CPU causes a 10Γ— slowdown. The disk saving is free; the compute cost is zero.
### Fallback for missing riders
If no COCO person is associated with a detected bike (e.g., very small image, occluded rider), one rider with no helmet is assumed. This is a conservative choice β€” it risks a false positive but never misses a genuine violation.
---
## Constraints Compliance
| Constraint | Status |
|---|---|
| Model size ≀ 250 MB | βœ… 194.6 MB |
| No VLMs > 1B parameters | βœ… Largest model is Depth-Anything V2 Small (~24M params) |
| Fully offline execution | βœ… All weights in `./models/`, `PADDLE_PDX_CACHE_HOME` redirected |
| `TrafficViolationDetector` interface | βœ… `__init__(model_dir)` + `predict(image_path) β†’ dict` |
| Stateless `predict()` | βœ… No mutable shared state between calls |
| Error handling | βœ… All exceptions caught; returns `{"violations": []}` on failure |
---
## Performance (Local Windows CPU)
| Metric | Value |
|---|---|
| Init time (cold start) | ~3–4 s |
| Inference β€” simple scene (1–2 bikes) | ~4–5 s |
| Inference β€” dense scene (8+ bikes) | ~10–12 s |
> **Note**: The evaluation server runs Linux with a faster CPU; inference times are expected to be lower. Depth estimation (Depth-Anything V2) is the primary bottleneck on CPU.