| # π§Ύ Model Card β CivicAi-YOLO11m-v1 |
|
|
| ## π§ Model Overview |
|
|
| **PotholeNet-YOLO11m-v1** is a fine-tuned object detection model built on **Ultralytics YOLO11m** architecture, specifically trained to detect potholes, road damage, and garbage from street-level imagery. The model leverages YOLO11m's C2PSA (Cross-Stage Partial with Spatial Attention) mechanism, making it highly effective at identifying irregular-shaped urban defects like potholes. |
|
|
| Trained on a large-scale, curated civic infrastructure dataset of **23,000+ street-level images** from Indian urban environments, this model is designed to power real-time civic issue detection systems, enabling automated reporting and faster municipal response. |
|
|
| It serves as the **Detection Layer (Layer 1)** of the **Aamchi City AI Civic System** β an end-to-end intelligent dashboard for urban infrastructure monitoring. |
|
|
| --- |
|
|
| ## ποΈ Training Details |
|
|
| | Parameter | Value | |
| |:---|:---| |
| | **Base Model** | `yolo11m.pt` (COCO pretrained) | |
| | **Architecture** | YOLO11m (C3k2 + C2PSA Spatial Attention) | |
| | **Framework** | Ultralytics v8.x | |
| | **Training Hardware** | Kaggle β NVIDIA T4 Γ2 (Dual GPU) | |
| | **Epochs** | 50 | |
| | **Input Resolution** | 768Γ768 | |
| | **Batch Size** | Auto (`batch=-1`) | |
| | **Optimizer** | AdamW | |
| | **Learning Rate** | `lr0=0.001`, cosine decay to `lrf=0.01` | |
| | **Warmup** | 3 epochs | |
| | **Weight Decay** | 0.0005 | |
| | **AMP** | Enabled (FP16 mixed precision) | |
| | **Early Stopping** | `patience=10` (did not trigger β model was still improving) | |
|
|
| ### Loss Weights |
| | Loss | Weight | |
| |:---|:---| |
| | Box Loss | 7.5 | |
| | Classification Loss | 1.0 | |
| | DFL Loss | 1.5 | |
|
|
| ### Augmentation Pipeline |
| | Augmentation | Value | |
| |:---|:---| |
| | Mosaic | 1.0 | |
| | MixUp | 0.15 | |
| | Copy-Paste | 0.1 | |
| | HSV (H/S/V) | 0.015 / 0.7 / 0.4 | |
| | Rotation | Β±10Β° | |
| | Scale | 0.5 | |
| | Shear | 2.0 | |
| | Horizontal Flip | 0.5 | |
| | Erasing | 0.3 | |
| | Label Smoothing | 0.05 | |
| | Close Mosaic | Last 8 epochs | |
|
|
| --- |
|
|
| ## π Dataset Description |
|
|
| The model was trained on a curated subset of **23,179 street-level images** collected from Indian urban environments. The dataset underwent extensive preprocessing: |
|
|
| - **Perceptual Hash (pHash) Deduplication** β Removed near-duplicate images using hamming distance β€ 4 |
| - **Corrupt Image Removal** β Verified all images via PIL |
| - **Intelligent Negative Sampling** β Trimmed empty-label (background) images to 2,000 hard negatives |
| - **Stratified Split** β 80% Train / 15% Val / 5% Test, stratified by dominant class |
|
|
| ### Label Classes |
|
|
| | Class ID | Class Name | Description | |
| |:---|:---|:---| |
| | π΄ 0 | **Pothole** | Road surface cavities and depressions | |
| | π‘ 1 | **Road Damage** | Cracks, surface wear, and structural deterioration | |
| | π’ 2 | **Garbage** | Street-level waste and debris accumulation | |
|
|
| > **Priority:** Pothole (primary) > Garbage > Road Damage |
|
|
| --- |
|
|
| ## π― Evaluation Metrics |
|
|
| | Metric | Score | |
| |:---|:---| |
| | **mAP50** | **0.86** | |
| | **mAP50-95** | β | |
| | **Parameters** | ~20M | |
| | **Model Size** | ~39 MB | |
| | **Inference Speed** | Real-time on GPU | |
|
|
| > β‘ The model did not trigger early stopping at 50 epochs, indicating further training could yield additional performance gains. |
|
|
| --- |
|
|
| ## π¬ Example Usage |
|
|
| ### Python (Ultralytics) |
|
|
| ```python |
| from ultralytics import YOLO |
| |
| # Load model |
| model = YOLO("best.pt") |
| |
| # Run inference |
| results = model("street_image.jpg", imgsz=768, conf=0.25) |
| |
| # Display results |
| results[0].show() |
| |
| # Access detections |
| for box in results[0].boxes: |
| cls = int(box.cls) |
| conf = float(box.conf) |
| xyxy = box.xyxy[0].tolist() |
| class_names = {0: "pothole", 1: "road_damage", 2: "garbage"} |
| print(f"{class_names[cls]}: {conf:.2f} at {xyxy}") |
| ``` |
|
|
| ### With Test-Time Augmentation (TTA) |
|
|
| ```python |
| # TTA boosts mAP by +1-3% at the cost of inference speed |
| results = model("street_image.jpg", imgsz=768, conf=0.25, augment=True) |
| ``` |
|
|
| ### Filter Pothole-Only Detections |
|
|
| ```python |
| results = model("street_image.jpg", conf=0.25) |
| boxes = results[0].boxes |
| pothole_mask = boxes.cls == 0 |
| pothole_boxes = boxes[pothole_mask] |
| print(f"Found {len(pothole_boxes)} potholes") |
| ``` |
|
|
| --- |
|
|
| ## π§© Intended Use |
|
|
| - **Real-time pothole detection** from dashcam, mobile phone, or street-view imagery |
| - **Automated civic issue reporting** β GPS-tagged detection for municipal dashboards |
| - **Infrastructure health monitoring** β Severity scoring and trend analysis for road maintenance |
| - **Smart city integration** β Layer 1 detection input for AI-driven civic action systems |
| - **Mobile deployment** β Exportable to ONNX for edge inference on mobile devices |
|
|
| --- |
|
|
| ## β οΈ Limitations |
|
|
| - The model is optimized for **Indian urban road conditions**; performance may degrade on highways, rural roads, or non-Indian geographies. |
| - **Road damage** class has visual overlap with potholes, which may cause occasional misclassification between the two. |
| - Performance is best on **daytime, clear-weather imagery** β low-light and rain-occluded scenes may reduce accuracy. |
| - The model was trained for **50 epochs without early stopping trigger**, suggesting the checkpoint is not fully converged and further fine-tuning could improve results. |
| - **Small potholes** (< 32px at 768px resolution) may be missed in wide-angle shots. |
|
|
| --- |
|
|
| ## π§βπ» Developer |
|
|
| | | | |
| |:---|:---| |
| | **Author** | Vansh Momaya | |
| | **Institution** | D. J. Sanghvi College of Engineering | |
| | **Focus Area** | Computer Vision, Object Detection, AI for Civic Infrastructure | |
| | **Email** | vanshmomaya9@gmail.com | |
|
|
| --- |
|
|
| ## π Citation |
|
|
| If you use PotholeNet-YOLO11m-v1 in your research or project: |
|
|
| ```bibtex |
| @online{momaya2026potholenet, |
| author = {Vansh Momaya}, |
| title = {PotholeNet-YOLO11m-v1: Real-Time Pothole and Civic Issue Detection for Indian Urban Roads}, |
| year = {2026}, |
| version = {v1}, |
| url = {https://huggingface.co/Vansh180/PotholeNet-YOLO11m-v1}, |
| institution = {D. J. Sanghvi College of Engineering}, |
| note = {Fine-tuned YOLO11m model for detecting potholes, road damage, and garbage in Indian street imagery}, |
| license = {MIT} |
| } |
| ``` |
|
|
| --- |
|
|
| ## π Acknowledgements |
|
|
| - **[Ultralytics YOLO11](https://github.com/ultralytics/ultralytics)** β Base architecture and training framework |
| - **[Kaggle](https://www.kaggle.com)** β Training infrastructure (Dual T4 GPU) |
| - **Aamchi City β Datahack 4** β Hackathon context and dataset |
|
|
| --- |
|
|
| *Built for the Aamchi City AI Civic System β Datahack 4, PS2 Core ML* |
|
|