File size: 6,513 Bytes

# 🧾 Model Card — CivicAi-YOLO11m-v1

## 🧠 Model Overview

**PotholeNet-YOLO11m-v1** is a fine-tuned object detection model built on **Ultralytics YOLO11m** architecture, specifically trained to detect potholes, road damage, and garbage from street-level imagery. The model leverages YOLO11m's C2PSA (Cross-Stage Partial with Spatial Attention) mechanism, making it highly effective at identifying irregular-shaped urban defects like potholes.

Trained on a large-scale, curated civic infrastructure dataset of **23,000+ street-level images** from Indian urban environments, this model is designed to power real-time civic issue detection systems, enabling automated reporting and faster municipal response.

It serves as the **Detection Layer (Layer 1)** of the **Aamchi City AI Civic System** — an end-to-end intelligent dashboard for urban infrastructure monitoring.

---

## 🏗️ Training Details

| Parameter | Value |
|:---|:---|
| **Base Model** | `yolo11m.pt` (COCO pretrained) |
| **Architecture** | YOLO11m (C3k2 + C2PSA Spatial Attention) |
| **Framework** | Ultralytics v8.x |
| **Training Hardware** | Kaggle — NVIDIA T4 ×2 (Dual GPU) |
| **Epochs** | 50 |
| **Input Resolution** | 768×768 |
| **Batch Size** | Auto (`batch=-1`) |
| **Optimizer** | AdamW |
| **Learning Rate** | `lr0=0.001`, cosine decay to `lrf=0.01` |
| **Warmup** | 3 epochs |
| **Weight Decay** | 0.0005 |
| **AMP** | Enabled (FP16 mixed precision) |
| **Early Stopping** | `patience=10` (did not trigger — model was still improving) |

### Loss Weights
| Loss | Weight |
|:---|:---|
| Box Loss | 7.5 |
| Classification Loss | 1.0 |
| DFL Loss | 1.5 |

### Augmentation Pipeline
| Augmentation | Value |
|:---|:---|
| Mosaic | 1.0 |
| MixUp | 0.15 |
| Copy-Paste | 0.1 |
| HSV (H/S/V) | 0.015 / 0.7 / 0.4 |
| Rotation | ±10° |
| Scale | 0.5 |
| Shear | 2.0 |
| Horizontal Flip | 0.5 |
| Erasing | 0.3 |
| Label Smoothing | 0.05 |
| Close Mosaic | Last 8 epochs |

---

## 📊 Dataset Description

The model was trained on a curated subset of **23,179 street-level images** collected from Indian urban environments. The dataset underwent extensive preprocessing:

- **Perceptual Hash (pHash) Deduplication** — Removed near-duplicate images using hamming distance ≤ 4
- **Corrupt Image Removal** — Verified all images via PIL
- **Intelligent Negative Sampling** — Trimmed empty-label (background) images to 2,000 hard negatives
- **Stratified Split** — 80% Train / 15% Val / 5% Test, stratified by dominant class

### Label Classes

| Class ID | Class Name | Description |
|:---|:---|:---|
| 🔴 0 | **Pothole** | Road surface cavities and depressions |
| 🟡 1 | **Road Damage** | Cracks, surface wear, and structural deterioration |
| 🟢 2 | **Garbage** | Street-level waste and debris accumulation |

> **Priority:** Pothole (primary) > Garbage > Road Damage

---

## 🎯 Evaluation Metrics

| Metric | Score |
|:---|:---|
| **mAP50** | **0.86** |
| **mAP50-95** | — |
| **Parameters** | ~20M |
| **Model Size** | ~39 MB |
| **Inference Speed** | Real-time on GPU |

> ⚡ The model did not trigger early stopping at 50 epochs, indicating further training could yield additional performance gains.

---

## 💬 Example Usage

### Python (Ultralytics)

```python
from ultralytics import YOLO

# Load model
model = YOLO("best.pt")

# Run inference
results = model("street_image.jpg", imgsz=768, conf=0.25)

# Display results
results[0].show()

# Access detections
for box in results[0].boxes:
    cls = int(box.cls)
    conf = float(box.conf)
    xyxy = box.xyxy[0].tolist()
    class_names = {0: "pothole", 1: "road_damage", 2: "garbage"}
    print(f"{class_names[cls]}: {conf:.2f} at {xyxy}")
```

### With Test-Time Augmentation (TTA)

```python
# TTA boosts mAP by +1-3% at the cost of inference speed
results = model("street_image.jpg", imgsz=768, conf=0.25, augment=True)
```

### Filter Pothole-Only Detections

```python
results = model("street_image.jpg", conf=0.25)
boxes = results[0].boxes
pothole_mask = boxes.cls == 0
pothole_boxes = boxes[pothole_mask]
print(f"Found {len(pothole_boxes)} potholes")
```

---

## 🧩 Intended Use

- **Real-time pothole detection** from dashcam, mobile phone, or street-view imagery
- **Automated civic issue reporting** — GPS-tagged detection for municipal dashboards
- **Infrastructure health monitoring** — Severity scoring and trend analysis for road maintenance
- **Smart city integration** — Layer 1 detection input for AI-driven civic action systems
- **Mobile deployment** — Exportable to ONNX for edge inference on mobile devices

---

## ⚠️ Limitations

- The model is optimized for **Indian urban road conditions**; performance may degrade on highways, rural roads, or non-Indian geographies.
- **Road damage** class has visual overlap with potholes, which may cause occasional misclassification between the two.
- Performance is best on **daytime, clear-weather imagery** — low-light and rain-occluded scenes may reduce accuracy.
- The model was trained for **50 epochs without early stopping trigger**, suggesting the checkpoint is not fully converged and further fine-tuning could improve results.
- **Small potholes** (< 32px at 768px resolution) may be missed in wide-angle shots.

---

## 🧑‍💻 Developer

| | |
|:---|:---|
| **Author** | Vansh Momaya |
| **Institution** | D. J. Sanghvi College of Engineering |
| **Focus Area** | Computer Vision, Object Detection, AI for Civic Infrastructure |
| **Email** | vanshmomaya9@gmail.com |

---

## 🌍 Citation

If you use PotholeNet-YOLO11m-v1 in your research or project:

```bibtex
@online{momaya2026potholenet,
  author       = {Vansh Momaya},
  title        = {PotholeNet-YOLO11m-v1: Real-Time Pothole and Civic Issue Detection for Indian Urban Roads},
  year         = {2026},
  version      = {v1},
  url          = {https://huggingface.co/Vansh180/PotholeNet-YOLO11m-v1},
  institution  = {D. J. Sanghvi College of Engineering},
  note         = {Fine-tuned YOLO11m model for detecting potholes, road damage, and garbage in Indian street imagery},
  license      = {MIT}
}
```

---

## 🚀 Acknowledgements

- **[Ultralytics YOLO11](https://github.com/ultralytics/ultralytics)** — Base architecture and training framework
- **[Kaggle](https://www.kaggle.com)** — Training infrastructure (Dual T4 GPU)
- **Aamchi City — Datahack 4** — Hackathon context and dataset

---

*Built for the Aamchi City AI Civic System — Datahack 4, PS2 Core ML*