---
title: Gridlock Traffic Violation API
emoji: 🚦
colorFrom: red
colorTo: blue
sdk: gradio
python_version: "3.10"
app_file: app.py
pinned: false
---
# AID 728 — Traffic Rule Violation Detection

**IIIT Bangalore**

Detects traffic rule violations involving two-wheelers from single RGB street-camera images. Identifies **helmet violations**, **over-riding (>2 riders on one bike)**, and extracts the **license plate text** of every violating vehicle.

---

## Submission Files

```
final_submission/
├── solution.py          # Core detection pipeline (TrafficViolationDetector class)
├── requirements.txt     # All Python dependencies
├── README.md            # This file
└── models/              # All model weights (bundled, fully offline)
    ├── yolov8s.pt                        # COCO primary detector          (21.54 MB)
    ├── stage1_best.pt                    # Custom two-wheeler detector     (21.49 MB)
    ├── helmet_v11.pt                     # Helmet classifier               (5.22 MB)
    ├── license.pt                        # License plate localiser         (42.77 MB)
    ├── FSRCNN_x3.pb                      # Super-resolution for plates     (0.04 MB)
    ├── depth_anything_v2/                # Depth-Anything V2 Small (HF)    (47.31 MB fp16)
    └── paddleocr/                        # Bundled PaddleOCR models
        └── official_models/
            └── ...

The pipeline also uses the `inference_sdk` to query the Roboflow API for:
- **Wrong-way driving detection** (`wrong-way-driving-detection-gqdmg/1`)
- **Seatbelt classification** (`seat-belt-detection-udcfg/5`)

Total model size: 194.59 MB  (limit: 250 MB)
```

---

## Quick Start

### Install dependencies
```bash
pip install -r requirements.txt
```

### Run inference
```python
from solution import TrafficViolationDetector

detector = TrafficViolationDetector(model_dir="./models")
result   = detector.predict("path/to/image.jpg")
print(result)
```

### Output format
```json
{
  "violations": [
    {
      "vehicle_type":      "two_wheeler",
      "num_riders":        2,
      "helmet_violations": 1,
      "wrong_way":         false,
      "license_plate":     "DL 7S AF 8144"
    },
    {
      "vehicle_type":        "four_wheeler",
      "seatbelt_violations": 1,
      "wrong_way":           true,
      "license_plate":       "MH 12 AB 1234"
    }
  ]
}
```

- One entry per **violating** two-wheeler only
- `violations` is an empty list `[]` if no violations are found
- `license_plate` is `"UNKNOWN"` when the plate cannot be read
- `num_riders` counts riders per bike; `helmet_violations` counts those without a helmet

---

## Pipeline Architecture

The pipeline runs in 7 sequential stages per image:

```
Input Image
    │
    ▼
┌─────────────────────────────────────────────────────────────────┐
│  Stage 1 — Primary Detection (yolov8s.pt, COCO)                │
│  Detects: persons (cls 0), motorcycles (cls 3)                  │
└────────────────────────┬────────────────────────────────────────┘
                         │
    ┌────────────────────▼────────────────────┐
    │  Stage 2 — Supplemental Bike Detection  │
    │  (stage1_best.pt — custom trained)      │
    │  Merged with Stage 1 bikes via NMS      │
    └────────────────────┬────────────────────┘
                         │
    ┌────────────────────▼────────────────────┐
    │  Stage 3 — Monocular Depth Estimation   │
    │  (Depth-Anything V2 Small, fp16 stored) │
    │  Produces normalised depth map [0,1]    │
    └────────────────────┬────────────────────┘
                         │
    ┌────────────────────▼────────────────────┐
    │  Stage 4 — Person → Bike Association    │
    │  Criteria: IoU overlap + column align   │
    │            + depth proximity check      │
    └────────────────────┬────────────────────┘
                         │
              ┌──────────▼──────────┐
              │   Per-bike loop     │
              └──────────┬──────────┘
                         │
    ┌────────────────────▼────────────────────┐
    │  Stage 5 — Helmet Classification        │
    │  (helmet_v11.pt — YOLOv11 custom)       │
    │  Crops top 45% of each rider bbox       │
    │  (head region), runs cls 0=helmet       │
    └────────────────────┬────────────────────┘
                         │
    ┌────────────────────▼────────────────────┐
    │  Stage 6 — Wrong Way Detection (API)    │
    │  (wrong-way-driving-detection-gqdmg/1)  │
    │  Flags vehicle bounding boxes that      │
    │  overlap with 'wrong-side' detections   │
    └────────────────────┬────────────────────┘
                         │
    ┌────────────────────▼────────────────────┐
    │  Stage 7 — Seatbelt Detection (API)     │
    │  (seat-belt-detection-udcfg/5)          │
    │  Runs only on four-wheeler crops        │
    └────────────────────┬────────────────────┘
                         │
    ┌────────────────────▼────────────────────┐
    │  Stage 8 — License Plate Localisation   │
    │  (license.pt — YOLO custom)             │
    │  Runs on violating vehicles             │
    └────────────────────┬────────────────────┘
                         │
    ┌────────────────────▼────────────────────┐
    │  Stage 9 — OCR (PaddleOCR 3.5.0)       │
    │  FSRCNN x3 super-resolution → CLAHE     │
    │  sharpening → PP-OCRv5 mobile det+rec   │
    │  Text cleaned: uppercase alphanumeric   │
    └────────────────────┬────────────────────┘
                         │
                         ▼
              Output: violations list
```

### Violation Logic
- A bike is flagged as a **violation** if:
  - `num_riders >= 3` (over-riding), **OR**
  - `helmet_violations > 0` (at least one rider without a helmet)
- Only violating bikes appear in the output list

---

## Model Details

### `yolov8s.pt` — COCO Primary Detector
- **Type**: YOLOv8 Small, pretrained on COCO
- **Used for**: Detecting persons (class 0) and motorcycles (class 3)
- **Confidence**: 0.30, IoU: 0.45

### `stage1_best.pt` — Custom Two-Wheeler Detector
- **Type**: YOLOv8-based, custom trained
- **Used for**: Supplementing COCO detections with domain-specific two-wheeler types (scooters, three-wheelers, etc. that COCO misses)
- **Merge**: Combined with COCO bike boxes via IoU-based NMS (threshold 0.45)
- **Augmented inference** (`augment=True`) for improved recall

### `depth_anything_v2/` — Monocular Depth Estimation
- **Type**: Depth-Anything V2 Small (Hugging Face Transformers)
- **Used for**: Filtering out background pedestrians that share column overlap with a detected bike but are at a different depth plane
- **Storage**: fp16 safetensors on disk (47.3 MB vs 94.6 MB fp32) — loaded as fp32 at runtime for CPU inference speed
- **Output**: Normalised depth map [0, 1] resized to match the input image

### `helmet_v11.pt` — Helmet Classifier
- **Type**: YOLOv11-based, custom trained on merged dataset
- **Training data**: 4 merged Kaggle datasets (andrewmvd, aneesarom, roboflow ×2) — all remapped to 2 classes: `with_helmet (0)`, `without_helmet (1)`
- **Input**: Top 45% of each rider bounding box (head crop) with 5% lateral padding
- **Confidence**: 0.25

### `license.pt` — License Plate Localiser
- **Type**: YOLO custom, trained on Indian license plates
- **Used for**: Detecting the tight bounding box of the license plate within a bike crop
- **Confidence**: 0.20 (low threshold to catch partially visible plates)

### `FSRCNN_x3.pb` — Super-Resolution
- **Type**: FSRCNN (Fast Super-Resolution CNN), ×3 scale, TensorFlow/OpenCV DNN
- **Used for**: Upscaling small plate crops (often <100px tall) 3× before OCR to improve recognition accuracy

### `paddleocr/` — OCR Engine (PaddleOCR 3.5.0)
- **Detection**: `PP-OCRv5_mobile_det` (4.7 MB) — finds text line bounding boxes within the plate crop
- **Recognition**: `en_PP-OCRv5_mobile_rec` (7.6 MB) — reads each text line
- **Orientation models**: `PP-LCNet_x1_0_doc_ori`, `PP-LCNet_x1_0_textline_ori` — handle rotated plates
- **Unwarping**: `UVDoc` — corrects perspective distortion
- **API**: Uses the legacy `.ocr()` method (not `.predict()`). Both call the same underlying pipeline, but `.ocr()` uses a compatible inference backend on Windows/Linux CPU without triggering the OneDNN fused_conv2d operator crash present in the newer `.predict()` path
- **Post-processing**: Text is uppercased, non-alphanumeric characters stripped, tokens shorter than 2 characters discarded

---

## Offline Operation

All model weights are bundled in `./models/`. No internet connection is required at runtime.

PaddleOCR 3.5.0 uses [paddlex](https://github.com/PaddlePaddle/PaddleX) internally and looks for models via the `PADDLE_PDX_CACHE_HOME` environment variable. `solution.py` sets this variable to `./models/paddleocr/` **before** any paddle import, so paddlex resolves all models from the bundled path:

```python
os.environ["PADDLE_PDX_CACHE_HOME"] = str(Path(__file__).parent / "models" / "paddleocr")
```

---

## Design Decisions

### Why two bike detectors?
COCO's `motorcycle` class (cls 3) misses many Indian two-wheeler types. The custom `stage1_best.pt` trained on traffic footage recovers these. Boxes from both are merged via NMS.

### Why depth filtering?
In busy street scenes, COCO frequently detects pedestrians on the footpath who share horizontal overlap with a detected bike. Depth-Anything V2 provides a proxy for Z-distance; persons whose median depth differs from the bike's median depth by more than 35% are excluded from association.

### Why not use PaddleOCR's server detection model?
`PP-OCRv5_server_det` is 84.3 MB — bundling it would push the total over 250 MB. Instead, `license.pt` performs the coarse plate localisation (narrowing the search area to ~125×90 px), then `PP-OCRv5_mobile_det` (4.7 MB) finds individual text lines within that small crop, and `en_PP-OCRv5_mobile_rec` reads them. This two-stage localisation gives equivalent quality at a fraction of the size.

### Why store depth model as fp16?
`model.safetensors` converted from fp32 (94.6 MB) to fp16 (47.3 MB) at submission time using `safetensors.torch`. At runtime the model is loaded as fp32 (`dtype=torch.float32`) because x86 CPUs have no native fp16 compute units — running fp16 tensors on CPU causes a 10× slowdown. The disk saving is free; the compute cost is zero.

### Fallback for missing riders
If no COCO person is associated with a detected bike (e.g., very small image, occluded rider), one rider with no helmet is assumed. This is a conservative choice — it risks a false positive but never misses a genuine violation.

---

## Constraints Compliance

| Constraint | Status |
|---|---|
| Model size ≤ 250 MB | ✅ 194.6 MB |
| No VLMs > 1B parameters | ✅ Largest model is Depth-Anything V2 Small (~24M params) |
| Fully offline execution | ✅ All weights in `./models/`, `PADDLE_PDX_CACHE_HOME` redirected |
| `TrafficViolationDetector` interface | ✅ `__init__(model_dir)` + `predict(image_path) → dict` |
| Stateless `predict()` | ✅ No mutable shared state between calls |
| Error handling | ✅ All exceptions caught; returns `{"violations": []}` on failure |

---

## Performance (Local Windows CPU)

| Metric | Value |
|---|---|
| Init time (cold start) | ~3–4 s |
| Inference — simple scene (1–2 bikes) | ~4–5 s |
| Inference — dense scene (8+ bikes) | ~10–12 s |

> **Note**: The evaluation server runs Linux with a faster CPU; inference times are expected to be lower. Depth estimation (Depth-Anything V2) is the primary bottleneck on CPU.