# ๐Ÿงพ Model Card โ€” CivicAi-YOLO11m-v1 ## ๐Ÿง  Model Overview **PotholeNet-YOLO11m-v1** is a fine-tuned object detection model built on **Ultralytics YOLO11m** architecture, specifically trained to detect potholes, road damage, and garbage from street-level imagery. The model leverages YOLO11m's C2PSA (Cross-Stage Partial with Spatial Attention) mechanism, making it highly effective at identifying irregular-shaped urban defects like potholes. Trained on a large-scale, curated civic infrastructure dataset of **23,000+ street-level images** from Indian urban environments, this model is designed to power real-time civic issue detection systems, enabling automated reporting and faster municipal response. It serves as the **Detection Layer (Layer 1)** of the **Aamchi City AI Civic System** โ€” an end-to-end intelligent dashboard for urban infrastructure monitoring. --- ## ๐Ÿ—๏ธ Training Details | Parameter | Value | |:---|:---| | **Base Model** | `yolo11m.pt` (COCO pretrained) | | **Architecture** | YOLO11m (C3k2 + C2PSA Spatial Attention) | | **Framework** | Ultralytics v8.x | | **Training Hardware** | Kaggle โ€” NVIDIA T4 ร—2 (Dual GPU) | | **Epochs** | 50 | | **Input Resolution** | 768ร—768 | | **Batch Size** | Auto (`batch=-1`) | | **Optimizer** | AdamW | | **Learning Rate** | `lr0=0.001`, cosine decay to `lrf=0.01` | | **Warmup** | 3 epochs | | **Weight Decay** | 0.0005 | | **AMP** | Enabled (FP16 mixed precision) | | **Early Stopping** | `patience=10` (did not trigger โ€” model was still improving) | ### Loss Weights | Loss | Weight | |:---|:---| | Box Loss | 7.5 | | Classification Loss | 1.0 | | DFL Loss | 1.5 | ### Augmentation Pipeline | Augmentation | Value | |:---|:---| | Mosaic | 1.0 | | MixUp | 0.15 | | Copy-Paste | 0.1 | | HSV (H/S/V) | 0.015 / 0.7 / 0.4 | | Rotation | ยฑ10ยฐ | | Scale | 0.5 | | Shear | 2.0 | | Horizontal Flip | 0.5 | | Erasing | 0.3 | | Label Smoothing | 0.05 | | Close Mosaic | Last 8 epochs | --- ## ๐Ÿ“Š Dataset Description The model was trained on a curated subset of **23,179 street-level images** collected from Indian urban environments. The dataset underwent extensive preprocessing: - **Perceptual Hash (pHash) Deduplication** โ€” Removed near-duplicate images using hamming distance โ‰ค 4 - **Corrupt Image Removal** โ€” Verified all images via PIL - **Intelligent Negative Sampling** โ€” Trimmed empty-label (background) images to 2,000 hard negatives - **Stratified Split** โ€” 80% Train / 15% Val / 5% Test, stratified by dominant class ### Label Classes | Class ID | Class Name | Description | |:---|:---|:---| | ๐Ÿ”ด 0 | **Pothole** | Road surface cavities and depressions | | ๐ŸŸก 1 | **Road Damage** | Cracks, surface wear, and structural deterioration | | ๐ŸŸข 2 | **Garbage** | Street-level waste and debris accumulation | > **Priority:** Pothole (primary) > Garbage > Road Damage --- ## ๐ŸŽฏ Evaluation Metrics | Metric | Score | |:---|:---| | **mAP50** | **0.86** | | **mAP50-95** | โ€” | | **Parameters** | ~20M | | **Model Size** | ~39 MB | | **Inference Speed** | Real-time on GPU | > โšก The model did not trigger early stopping at 50 epochs, indicating further training could yield additional performance gains. --- ## ๐Ÿ’ฌ Example Usage ### Python (Ultralytics) ```python from ultralytics import YOLO # Load model model = YOLO("best.pt") # Run inference results = model("street_image.jpg", imgsz=768, conf=0.25) # Display results results[0].show() # Access detections for box in results[0].boxes: cls = int(box.cls) conf = float(box.conf) xyxy = box.xyxy[0].tolist() class_names = {0: "pothole", 1: "road_damage", 2: "garbage"} print(f"{class_names[cls]}: {conf:.2f} at {xyxy}") ``` ### With Test-Time Augmentation (TTA) ```python # TTA boosts mAP by +1-3% at the cost of inference speed results = model("street_image.jpg", imgsz=768, conf=0.25, augment=True) ``` ### Filter Pothole-Only Detections ```python results = model("street_image.jpg", conf=0.25) boxes = results[0].boxes pothole_mask = boxes.cls == 0 pothole_boxes = boxes[pothole_mask] print(f"Found {len(pothole_boxes)} potholes") ``` --- ## ๐Ÿงฉ Intended Use - **Real-time pothole detection** from dashcam, mobile phone, or street-view imagery - **Automated civic issue reporting** โ€” GPS-tagged detection for municipal dashboards - **Infrastructure health monitoring** โ€” Severity scoring and trend analysis for road maintenance - **Smart city integration** โ€” Layer 1 detection input for AI-driven civic action systems - **Mobile deployment** โ€” Exportable to ONNX for edge inference on mobile devices --- ## โš ๏ธ Limitations - The model is optimized for **Indian urban road conditions**; performance may degrade on highways, rural roads, or non-Indian geographies. - **Road damage** class has visual overlap with potholes, which may cause occasional misclassification between the two. - Performance is best on **daytime, clear-weather imagery** โ€” low-light and rain-occluded scenes may reduce accuracy. - The model was trained for **50 epochs without early stopping trigger**, suggesting the checkpoint is not fully converged and further fine-tuning could improve results. - **Small potholes** (< 32px at 768px resolution) may be missed in wide-angle shots. --- ## ๐Ÿง‘โ€๐Ÿ’ป Developer | | | |:---|:---| | **Author** | Vansh Momaya | | **Institution** | D. J. Sanghvi College of Engineering | | **Focus Area** | Computer Vision, Object Detection, AI for Civic Infrastructure | | **Email** | vanshmomaya9@gmail.com | --- ## ๐ŸŒ Citation If you use PotholeNet-YOLO11m-v1 in your research or project: ```bibtex @online{momaya2026potholenet, author = {Vansh Momaya}, title = {PotholeNet-YOLO11m-v1: Real-Time Pothole and Civic Issue Detection for Indian Urban Roads}, year = {2026}, version = {v1}, url = {https://huggingface.co/Vansh180/PotholeNet-YOLO11m-v1}, institution = {D. J. Sanghvi College of Engineering}, note = {Fine-tuned YOLO11m model for detecting potholes, road damage, and garbage in Indian street imagery}, license = {MIT} } ``` --- ## ๐Ÿš€ Acknowledgements - **[Ultralytics YOLO11](https://github.com/ultralytics/ultralytics)** โ€” Base architecture and training framework - **[Kaggle](https://www.kaggle.com)** โ€” Training infrastructure (Dual T4 GPU) - **Aamchi City โ€” Datahack 4** โ€” Hackathon context and dataset --- *Built for the Aamchi City AI Civic System โ€” Datahack 4, PS2 Core ML*