Update README.md

c30e125 verified 4 days ago

6.51 kB

	# 🧾 Model Card — CivicAi-YOLO11m-v1

	## 🧠 Model Overview

	PotholeNet-YOLO11m-v1 is a fine-tuned object detection model built on Ultralytics YOLO11m architecture, specifically trained to detect potholes, road damage, and garbage from street-level imagery. The model leverages YOLO11m's C2PSA (Cross-Stage Partial with Spatial Attention) mechanism, making it highly effective at identifying irregular-shaped urban defects like potholes.

	Trained on a large-scale, curated civic infrastructure dataset of 23,000+ street-level images from Indian urban environments, this model is designed to power real-time civic issue detection systems, enabling automated reporting and faster municipal response.

	It serves as the Detection Layer (Layer 1) of the Aamchi City AI Civic System — an end-to-end intelligent dashboard for urban infrastructure monitoring.

	---

	## 🏗️ Training Details

	\| Parameter \| Value \|
	\|:---\|:---\|
	\| Base Model \| `yolo11m.pt` (COCO pretrained) \|
	\| Architecture \| YOLO11m (C3k2 + C2PSA Spatial Attention) \|
	\| Framework \| Ultralytics v8.x \|
	\| Training Hardware \| Kaggle — NVIDIA T4 ×2 (Dual GPU) \|
	\| Epochs \| 50 \|
	\| Input Resolution \| 768×768 \|
	\| Batch Size \| Auto (`batch=-1`) \|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| `lr0=0.001`, cosine decay to `lrf=0.01` \|
	\| Warmup \| 3 epochs \|
	\| Weight Decay \| 0.0005 \|
	\| AMP \| Enabled (FP16 mixed precision) \|
	\| Early Stopping \| `patience=10` (did not trigger — model was still improving) \|

	### Loss Weights
	\| Loss \| Weight \|
	\|:---\|:---\|
	\| Box Loss \| 7.5 \|
	\| Classification Loss \| 1.0 \|
	\| DFL Loss \| 1.5 \|

	### Augmentation Pipeline
	\| Augmentation \| Value \|
	\|:---\|:---\|
	\| Mosaic \| 1.0 \|
	\| MixUp \| 0.15 \|
	\| Copy-Paste \| 0.1 \|
	\| HSV (H/S/V) \| 0.015 / 0.7 / 0.4 \|
	\| Rotation \| ±10° \|
	\| Scale \| 0.5 \|
	\| Shear \| 2.0 \|
	\| Horizontal Flip \| 0.5 \|
	\| Erasing \| 0.3 \|
	\| Label Smoothing \| 0.05 \|
	\| Close Mosaic \| Last 8 epochs \|

	---

	## 📊 Dataset Description

	The model was trained on a curated subset of 23,179 street-level images collected from Indian urban environments. The dataset underwent extensive preprocessing:

	- Perceptual Hash (pHash) Deduplication — Removed near-duplicate images using hamming distance ≤ 4
	- Corrupt Image Removal — Verified all images via PIL
	- Intelligent Negative Sampling — Trimmed empty-label (background) images to 2,000 hard negatives
	- Stratified Split — 80% Train / 15% Val / 5% Test, stratified by dominant class

	### Label Classes

	\| Class ID \| Class Name \| Description \|
	\|:---\|:---\|:---\|
	\| 🔴 0 \| Pothole \| Road surface cavities and depressions \|
	\| 🟡 1 \| Road Damage \| Cracks, surface wear, and structural deterioration \|
	\| 🟢 2 \| Garbage \| Street-level waste and debris accumulation \|

	> Priority: Pothole (primary) > Garbage > Road Damage

	---

	## 🎯 Evaluation Metrics

	\| Metric \| Score \|
	\|:---\|:---\|
	\| mAP50 \| 0.86 \|
	\| mAP50-95 \| — \|
	\| Parameters \| ~20M \|
	\| Model Size \| ~39 MB \|
	\| Inference Speed \| Real-time on GPU \|

	> ⚡ The model did not trigger early stopping at 50 epochs, indicating further training could yield additional performance gains.

	---

	## 💬 Example Usage

	### Python (Ultralytics)

	```python
	from ultralytics import YOLO

	# Load model
	model = YOLO("best.pt")

	# Run inference
	results = model("street_image.jpg", imgsz=768, conf=0.25)

	# Display results
	results[0].show()

	# Access detections
	for box in results[0].boxes:
	cls = int(box.cls)
	conf = float(box.conf)
	xyxy = box.xyxy[0].tolist()
	class_names = {0: "pothole", 1: "road_damage", 2: "garbage"}
	print(f"{class_names[cls]}: {conf:.2f} at {xyxy}")
	```

	### With Test-Time Augmentation (TTA)

	```python
	# TTA boosts mAP by +1-3% at the cost of inference speed
	results = model("street_image.jpg", imgsz=768, conf=0.25, augment=True)
	```

	### Filter Pothole-Only Detections

	```python
	results = model("street_image.jpg", conf=0.25)
	boxes = results[0].boxes
	pothole_mask = boxes.cls == 0
	pothole_boxes = boxes[pothole_mask]
	print(f"Found {len(pothole_boxes)} potholes")
	```

	---

	## 🧩 Intended Use

	- Real-time pothole detection from dashcam, mobile phone, or street-view imagery
	- Automated civic issue reporting — GPS-tagged detection for municipal dashboards
	- Infrastructure health monitoring — Severity scoring and trend analysis for road maintenance
	- Smart city integration — Layer 1 detection input for AI-driven civic action systems
	- Mobile deployment — Exportable to ONNX for edge inference on mobile devices

	---

	## ⚠️ Limitations

	- The model is optimized for Indian urban road conditions; performance may degrade on highways, rural roads, or non-Indian geographies.
	- Road damage class has visual overlap with potholes, which may cause occasional misclassification between the two.
	- Performance is best on daytime, clear-weather imagery — low-light and rain-occluded scenes may reduce accuracy.
	- The model was trained for 50 epochs without early stopping trigger, suggesting the checkpoint is not fully converged and further fine-tuning could improve results.
	- Small potholes (< 32px at 768px resolution) may be missed in wide-angle shots.

	---

	## 🧑‍💻 Developer

	\| \| \|
	\|:---\|:---\|
	\| Author \| Vansh Momaya \|
	\| Institution \| D. J. Sanghvi College of Engineering \|
	\| Focus Area \| Computer Vision, Object Detection, AI for Civic Infrastructure \|
	\| Email \| vanshmomaya9@gmail.com \|

	---

	## 🌍 Citation

	If you use PotholeNet-YOLO11m-v1 in your research or project:

	```bibtex
	@online{momaya2026potholenet,
	author = {Vansh Momaya},
	title = {PotholeNet-YOLO11m-v1: Real-Time Pothole and Civic Issue Detection for Indian Urban Roads},
	year = {2026},
	version = {v1},
	url = {https://huggingface.co/Vansh180/PotholeNet-YOLO11m-v1},
	institution = {D. J. Sanghvi College of Engineering},
	note = {Fine-tuned YOLO11m model for detecting potholes, road damage, and garbage in Indian street imagery},
	license = {MIT}
	}
	```

	---

	## 🚀 Acknowledgements

	- [Ultralytics YOLO11](https://github.com/ultralytics/ultralytics) — Base architecture and training framework
	- [Kaggle](https://www.kaggle.com) — Training infrastructure (Dual T4 GPU)
	- Aamchi City — Datahack 4 — Hackathon context and dataset

	---

	Built for the Aamchi City AI Civic System — Datahack 4, PS2 Core ML

	# 🧾 Model Card — CivicAi-YOLO11m-v1

	## 🧠 Model Overview

	PotholeNet-YOLO11m-v1 is a fine-tuned object detection model built on Ultralytics YOLO11m architecture, specifically trained to detect potholes, road damage, and garbage from street-level imagery. The model leverages YOLO11m's C2PSA (Cross-Stage Partial with Spatial Attention) mechanism, making it highly effective at identifying irregular-shaped urban defects like potholes.

	Trained on a large-scale, curated civic infrastructure dataset of 23,000+ street-level images from Indian urban environments, this model is designed to power real-time civic issue detection systems, enabling automated reporting and faster municipal response.

	It serves as the Detection Layer (Layer 1) of the Aamchi City AI Civic System — an end-to-end intelligent dashboard for urban infrastructure monitoring.

	---

	## 🏗️ Training Details

	\| Parameter \| Value \|
	\|:---\|:---\|
	\| Base Model \| `yolo11m.pt` (COCO pretrained) \|
	\| Architecture \| YOLO11m (C3k2 + C2PSA Spatial Attention) \|
	\| Framework \| Ultralytics v8.x \|
	\| Training Hardware \| Kaggle — NVIDIA T4 ×2 (Dual GPU) \|
	\| Epochs \| 50 \|
	\| Input Resolution \| 768×768 \|
	\| Batch Size \| Auto (`batch=-1`) \|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| `lr0=0.001`, cosine decay to `lrf=0.01` \|
	\| Warmup \| 3 epochs \|
	\| Weight Decay \| 0.0005 \|
	\| AMP \| Enabled (FP16 mixed precision) \|
	\| Early Stopping \| `patience=10` (did not trigger — model was still improving) \|

	### Loss Weights
	\| Loss \| Weight \|
	\|:---\|:---\|
	\| Box Loss \| 7.5 \|
	\| Classification Loss \| 1.0 \|
	\| DFL Loss \| 1.5 \|

	### Augmentation Pipeline
	\| Augmentation \| Value \|
	\|:---\|:---\|
	\| Mosaic \| 1.0 \|
	\| MixUp \| 0.15 \|
	\| Copy-Paste \| 0.1 \|
	\| HSV (H/S/V) \| 0.015 / 0.7 / 0.4 \|
	\| Rotation \| ±10° \|
	\| Scale \| 0.5 \|
	\| Shear \| 2.0 \|
	\| Horizontal Flip \| 0.5 \|
	\| Erasing \| 0.3 \|
	\| Label Smoothing \| 0.05 \|
	\| Close Mosaic \| Last 8 epochs \|

	---

	## 📊 Dataset Description

	The model was trained on a curated subset of 23,179 street-level images collected from Indian urban environments. The dataset underwent extensive preprocessing:

	- Perceptual Hash (pHash) Deduplication — Removed near-duplicate images using hamming distance ≤ 4
	- Corrupt Image Removal — Verified all images via PIL
	- Intelligent Negative Sampling — Trimmed empty-label (background) images to 2,000 hard negatives
	- Stratified Split — 80% Train / 15% Val / 5% Test, stratified by dominant class

	### Label Classes

	\| Class ID \| Class Name \| Description \|
	\|:---\|:---\|:---\|
	\| 🔴 0 \| Pothole \| Road surface cavities and depressions \|
	\| 🟡 1 \| Road Damage \| Cracks, surface wear, and structural deterioration \|
	\| 🟢 2 \| Garbage \| Street-level waste and debris accumulation \|

	> Priority: Pothole (primary) > Garbage > Road Damage

	---

	## 🎯 Evaluation Metrics

	\| Metric \| Score \|
	\|:---\|:---\|
	\| mAP50 \| 0.86 \|
	\| mAP50-95 \| — \|
	\| Parameters \| ~20M \|
	\| Model Size \| ~39 MB \|
	\| Inference Speed \| Real-time on GPU \|

	> ⚡ The model did not trigger early stopping at 50 epochs, indicating further training could yield additional performance gains.

	---

	## 💬 Example Usage

	### Python (Ultralytics)

	```python
	from ultralytics import YOLO

	# Load model
	model = YOLO("best.pt")

	# Run inference
	results = model("street_image.jpg", imgsz=768, conf=0.25)

	# Display results
	results[0].show()

	# Access detections
	for box in results[0].boxes:
	cls = int(box.cls)
	conf = float(box.conf)
	xyxy = box.xyxy[0].tolist()
	class_names = {0: "pothole", 1: "road_damage", 2: "garbage"}
	print(f"{class_names[cls]}: {conf:.2f} at {xyxy}")
	```

	### With Test-Time Augmentation (TTA)

	```python
	# TTA boosts mAP by +1-3% at the cost of inference speed
	results = model("street_image.jpg", imgsz=768, conf=0.25, augment=True)
	```

	### Filter Pothole-Only Detections

	```python
	results = model("street_image.jpg", conf=0.25)
	boxes = results[0].boxes
	pothole_mask = boxes.cls == 0
	pothole_boxes = boxes[pothole_mask]
	print(f"Found {len(pothole_boxes)} potholes")
	```

	---

	## 🧩 Intended Use

	- Real-time pothole detection from dashcam, mobile phone, or street-view imagery
	- Automated civic issue reporting — GPS-tagged detection for municipal dashboards
	- Infrastructure health monitoring — Severity scoring and trend analysis for road maintenance
	- Smart city integration — Layer 1 detection input for AI-driven civic action systems
	- Mobile deployment — Exportable to ONNX for edge inference on mobile devices

	---

	## ⚠️ Limitations

	- The model is optimized for Indian urban road conditions; performance may degrade on highways, rural roads, or non-Indian geographies.
	- Road damage class has visual overlap with potholes, which may cause occasional misclassification between the two.
	- Performance is best on daytime, clear-weather imagery — low-light and rain-occluded scenes may reduce accuracy.
	- The model was trained for 50 epochs without early stopping trigger, suggesting the checkpoint is not fully converged and further fine-tuning could improve results.
	- Small potholes (< 32px at 768px resolution) may be missed in wide-angle shots.

	---

	## 🧑‍💻 Developer

	\| \| \|
	\|:---\|:---\|
	\| Author \| Vansh Momaya \|
	\| Institution \| D. J. Sanghvi College of Engineering \|
	\| Focus Area \| Computer Vision, Object Detection, AI for Civic Infrastructure \|
	\| Email \| vanshmomaya9@gmail.com \|

	---

	## 🌍 Citation

	If you use PotholeNet-YOLO11m-v1 in your research or project:

	```bibtex
	@online{momaya2026potholenet,
	author = {Vansh Momaya},
	title = {PotholeNet-YOLO11m-v1: Real-Time Pothole and Civic Issue Detection for Indian Urban Roads},
	year = {2026},
	version = {v1},
	url = {https://huggingface.co/Vansh180/PotholeNet-YOLO11m-v1},
	institution = {D. J. Sanghvi College of Engineering},
	note = {Fine-tuned YOLO11m model for detecting potholes, road damage, and garbage in Indian street imagery},
	license = {MIT}
	}
	```

	---

	## 🚀 Acknowledgements

	- [Ultralytics YOLO11](https://github.com/ultralytics/ultralytics) — Base architecture and training framework
	- [Kaggle](https://www.kaggle.com) — Training infrastructure (Dual T4 GPU)
	- Aamchi City — Datahack 4 — Hackathon context and dataset

	---

	Built for the Aamchi City AI Civic System — Datahack 4, PS2 Core ML