--- title: Pothole Severity Scoring emoji: 🕳️ colorFrom: yellow colorTo: red sdk: gradio sdk_version: 5.12.0 app_file: app.py pinned: false license: mit tags: - xgboost - tabular-regression - civic-tech - pothole-detection --- # Model Card for Pothole Severity Scoring ## Model Details ### Model Description This is an XGBoost Regressor model designed to predict the priority/severity score of civic infrastructure issues (specifically potholes). It evaluates multiple structural, environmental, and temporal features to output a severity score bounded between 0 and 1, assisting civic authorities in prioritizing repairs and resource allocation. - **Developed by:** Civic AI System (Demo) - **Model type:** XGBoost Regressor - **License:** MIT ## Uses ### Direct Use The model natively ingests 10 engineered features characterizing a reported pothole and outputs: - A numeric severity score ($S \in [0,1]$). - A qualitative priority label ("Low", "Medium", "High"). This is intended for sorting and prioritizing civil work dispatch queues. ## Bias, Risks, and Limitations The model heavily factors in proximity to critical infrastructure (`P`) and road hierarchy (`R`). While this effectively prioritizes areas like highways and hospitals, it may systematically delay repairs in neglected or local neighborhoods if those areas lack designated local "critical infrastructure". Disparate impact assessments should be run periodically to ensure equitable civic maintenance. ## Training Details ### Training Data The model was trained on a synthetically generated dataset of `10,000` samples designed to mirror realistic distributions of civic reporting. Features include: - `A`: Defect area ratio - `D`: Defect density - `C`: Centrality (distance from center) - `Q`: Initial detection confidence - `M`: Multi-user confirmation score - `T`: Temporal persistence (days unresolved) - `R`: Traffic importance tier - `P`: Proximity to critical infrastructure - `F`: Recurrence frequency - `X`: Resolution failure count All features are min-max scaled `[0,1]`. ### Training Procedure - **Algorithm:** XGBoost - **Objective:** `reg:squarederror` - **Trees:** 200 - **Max Depth:** 5 - **Learning Rate:** 0.05 ## 📊 Performance & Interpretability ### Model Metrics The model demonstrates high precision in predicting the severity score $S$, which controls civic resource allocation. | Metric | Value | Interpretation | | :--- | :--- | :--- | | **RMSE** | 0.0156 | Low average error | | **MAE** | 0.0112 | High predictive accuracy | | **R² Score** | 0.9418 | 94% of variance explained by features | ### Feature Importance (Gain) The following ranking describes how much each feature contributes to the XGBoost tree construction: 1. **C (Centrality)**: 0.3585 — Central potholes pose higher collision risks. 2. **A (Area Ratio)**: 0.2187 — Size of the defect is a primary driver. 3. **R (Road Type)**: 0.1629 — Priority given to highways over local streets. 4. **P (Proximity)**: 0.0937 — Closeness to critical infrastructure. ### SHAP Visualizations We use SHAP (SHapley Additive exPlanations) to explain individual predictions and global feature influence. #### Global Feature Impact The bar chart below shows the mean absolute SHAP value, identifying which features consistently shift the severity score. ![SHAP Bar Plot](shap_bar_plot.png) #### Detailed Impact (Beeswarm) The summary plot shows how high vs. low values of a feature affect the outcome. For example, high values of **C (Centrality)** push the score significantly higher. ![SHAP Dot Plot](shap_dot_plot.png)