PotholeNet-V1 / README.md
Vansh180's picture
Update README.md
c30e125 verified
# 🧾 Model Card β€” CivicAi-YOLO11m-v1
## 🧠 Model Overview
**PotholeNet-YOLO11m-v1** is a fine-tuned object detection model built on **Ultralytics YOLO11m** architecture, specifically trained to detect potholes, road damage, and garbage from street-level imagery. The model leverages YOLO11m's C2PSA (Cross-Stage Partial with Spatial Attention) mechanism, making it highly effective at identifying irregular-shaped urban defects like potholes.
Trained on a large-scale, curated civic infrastructure dataset of **23,000+ street-level images** from Indian urban environments, this model is designed to power real-time civic issue detection systems, enabling automated reporting and faster municipal response.
It serves as the **Detection Layer (Layer 1)** of the **Aamchi City AI Civic System** β€” an end-to-end intelligent dashboard for urban infrastructure monitoring.
---
## πŸ—οΈ Training Details
| Parameter | Value |
|:---|:---|
| **Base Model** | `yolo11m.pt` (COCO pretrained) |
| **Architecture** | YOLO11m (C3k2 + C2PSA Spatial Attention) |
| **Framework** | Ultralytics v8.x |
| **Training Hardware** | Kaggle β€” NVIDIA T4 Γ—2 (Dual GPU) |
| **Epochs** | 50 |
| **Input Resolution** | 768Γ—768 |
| **Batch Size** | Auto (`batch=-1`) |
| **Optimizer** | AdamW |
| **Learning Rate** | `lr0=0.001`, cosine decay to `lrf=0.01` |
| **Warmup** | 3 epochs |
| **Weight Decay** | 0.0005 |
| **AMP** | Enabled (FP16 mixed precision) |
| **Early Stopping** | `patience=10` (did not trigger β€” model was still improving) |
### Loss Weights
| Loss | Weight |
|:---|:---|
| Box Loss | 7.5 |
| Classification Loss | 1.0 |
| DFL Loss | 1.5 |
### Augmentation Pipeline
| Augmentation | Value |
|:---|:---|
| Mosaic | 1.0 |
| MixUp | 0.15 |
| Copy-Paste | 0.1 |
| HSV (H/S/V) | 0.015 / 0.7 / 0.4 |
| Rotation | Β±10Β° |
| Scale | 0.5 |
| Shear | 2.0 |
| Horizontal Flip | 0.5 |
| Erasing | 0.3 |
| Label Smoothing | 0.05 |
| Close Mosaic | Last 8 epochs |
---
## πŸ“Š Dataset Description
The model was trained on a curated subset of **23,179 street-level images** collected from Indian urban environments. The dataset underwent extensive preprocessing:
- **Perceptual Hash (pHash) Deduplication** β€” Removed near-duplicate images using hamming distance ≀ 4
- **Corrupt Image Removal** β€” Verified all images via PIL
- **Intelligent Negative Sampling** β€” Trimmed empty-label (background) images to 2,000 hard negatives
- **Stratified Split** β€” 80% Train / 15% Val / 5% Test, stratified by dominant class
### Label Classes
| Class ID | Class Name | Description |
|:---|:---|:---|
| πŸ”΄ 0 | **Pothole** | Road surface cavities and depressions |
| 🟑 1 | **Road Damage** | Cracks, surface wear, and structural deterioration |
| 🟒 2 | **Garbage** | Street-level waste and debris accumulation |
> **Priority:** Pothole (primary) > Garbage > Road Damage
---
## 🎯 Evaluation Metrics
| Metric | Score |
|:---|:---|
| **mAP50** | **0.86** |
| **mAP50-95** | β€” |
| **Parameters** | ~20M |
| **Model Size** | ~39 MB |
| **Inference Speed** | Real-time on GPU |
> ⚑ The model did not trigger early stopping at 50 epochs, indicating further training could yield additional performance gains.
---
## πŸ’¬ Example Usage
### Python (Ultralytics)
```python
from ultralytics import YOLO
# Load model
model = YOLO("best.pt")
# Run inference
results = model("street_image.jpg", imgsz=768, conf=0.25)
# Display results
results[0].show()
# Access detections
for box in results[0].boxes:
cls = int(box.cls)
conf = float(box.conf)
xyxy = box.xyxy[0].tolist()
class_names = {0: "pothole", 1: "road_damage", 2: "garbage"}
print(f"{class_names[cls]}: {conf:.2f} at {xyxy}")
```
### With Test-Time Augmentation (TTA)
```python
# TTA boosts mAP by +1-3% at the cost of inference speed
results = model("street_image.jpg", imgsz=768, conf=0.25, augment=True)
```
### Filter Pothole-Only Detections
```python
results = model("street_image.jpg", conf=0.25)
boxes = results[0].boxes
pothole_mask = boxes.cls == 0
pothole_boxes = boxes[pothole_mask]
print(f"Found {len(pothole_boxes)} potholes")
```
---
## 🧩 Intended Use
- **Real-time pothole detection** from dashcam, mobile phone, or street-view imagery
- **Automated civic issue reporting** β€” GPS-tagged detection for municipal dashboards
- **Infrastructure health monitoring** β€” Severity scoring and trend analysis for road maintenance
- **Smart city integration** β€” Layer 1 detection input for AI-driven civic action systems
- **Mobile deployment** β€” Exportable to ONNX for edge inference on mobile devices
---
## ⚠️ Limitations
- The model is optimized for **Indian urban road conditions**; performance may degrade on highways, rural roads, or non-Indian geographies.
- **Road damage** class has visual overlap with potholes, which may cause occasional misclassification between the two.
- Performance is best on **daytime, clear-weather imagery** β€” low-light and rain-occluded scenes may reduce accuracy.
- The model was trained for **50 epochs without early stopping trigger**, suggesting the checkpoint is not fully converged and further fine-tuning could improve results.
- **Small potholes** (< 32px at 768px resolution) may be missed in wide-angle shots.
---
## πŸ§‘β€πŸ’» Developer
| | |
|:---|:---|
| **Author** | Vansh Momaya |
| **Institution** | D. J. Sanghvi College of Engineering |
| **Focus Area** | Computer Vision, Object Detection, AI for Civic Infrastructure |
| **Email** | vanshmomaya9@gmail.com |
---
## 🌍 Citation
If you use PotholeNet-YOLO11m-v1 in your research or project:
```bibtex
@online{momaya2026potholenet,
author = {Vansh Momaya},
title = {PotholeNet-YOLO11m-v1: Real-Time Pothole and Civic Issue Detection for Indian Urban Roads},
year = {2026},
version = {v1},
url = {https://huggingface.co/Vansh180/PotholeNet-YOLO11m-v1},
institution = {D. J. Sanghvi College of Engineering},
note = {Fine-tuned YOLO11m model for detecting potholes, road damage, and garbage in Indian street imagery},
license = {MIT}
}
```
---
## πŸš€ Acknowledgements
- **[Ultralytics YOLO11](https://github.com/ultralytics/ultralytics)** β€” Base architecture and training framework
- **[Kaggle](https://www.kaggle.com)** β€” Training infrastructure (Dual T4 GPU)
- **Aamchi City β€” Datahack 4** β€” Hackathon context and dataset
---
*Built for the Aamchi City AI Civic System β€” Datahack 4, PS2 Core ML*