Severity_Score / README.md
cloud450's picture
Update README.md
530ce43 verified
---
title: Pothole Severity Scoring
emoji: 🕳️
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
tags:
- xgboost
- tabular-regression
- civic-tech
- pothole-detection
---
# Model Card for Pothole Severity Scoring
## Model Details
### Model Description
This is an XGBoost Regressor model designed to predict the priority/severity score of civic infrastructure issues (specifically potholes). It evaluates multiple structural, environmental, and temporal features to output a severity score bounded between 0 and 1, assisting civic authorities in prioritizing repairs and resource allocation.
- **Developed by:** Civic AI System (Demo)
- **Model type:** XGBoost Regressor
- **License:** MIT
## Uses
### Direct Use
The model natively ingests 10 engineered features characterizing a reported pothole and outputs:
- A numeric severity score ($S \in [0,1]$).
- A qualitative priority label ("Low", "Medium", "High").
This is intended for sorting and prioritizing civil work dispatch queues.
## Bias, Risks, and Limitations
The model heavily factors in proximity to critical infrastructure (`P`) and road hierarchy (`R`). While this effectively prioritizes areas like highways and hospitals, it may systematically delay repairs in neglected or local neighborhoods if those areas lack designated local "critical infrastructure". Disparate impact assessments should be run periodically to ensure equitable civic maintenance.
## Training Details
### Training Data
The model was trained on a synthetically generated dataset of `10,000` samples designed to mirror realistic distributions of civic reporting. Features include:
- `A`: Defect area ratio
- `D`: Defect density
- `C`: Centrality (distance from center)
- `Q`: Initial detection confidence
- `M`: Multi-user confirmation score
- `T`: Temporal persistence (days unresolved)
- `R`: Traffic importance tier
- `P`: Proximity to critical infrastructure
- `F`: Recurrence frequency
- `X`: Resolution failure count
All features are min-max scaled `[0,1]`.
### Training Procedure
- **Algorithm:** XGBoost
- **Objective:** `reg:squarederror`
- **Trees:** 200
- **Max Depth:** 5
- **Learning Rate:** 0.05
## 📊 Performance & Interpretability
### Model Metrics
The model demonstrates high precision in predicting the severity score $S$, which controls civic resource allocation.
| Metric | Value | Interpretation |
| :--- | :--- | :--- |
| **RMSE** | 0.0156 | Low average error |
| **MAE** | 0.0112 | High predictive accuracy |
| **R² Score** | 0.9418 | 94% of variance explained by features |
### Feature Importance (Gain)
The following ranking describes how much each feature contributes to the XGBoost tree construction:
1. **C (Centrality)**: 0.3585 — Central potholes pose higher collision risks.
2. **A (Area Ratio)**: 0.2187 — Size of the defect is a primary driver.
3. **R (Road Type)**: 0.1629 — Priority given to highways over local streets.
4. **P (Proximity)**: 0.0937 — Closeness to critical infrastructure.
### SHAP Visualizations
We use SHAP (SHapley Additive exPlanations) to explain individual predictions and global feature influence.
#### Global Feature Impact
The bar chart below shows the mean absolute SHAP value, identifying which features consistently shift the severity score.
![SHAP Bar Plot](shap_bar_plot.png)
#### Detailed Impact (Beeswarm)
The summary plot shows how high vs. low values of a feature affect the outcome. For example, high values of **C (Centrality)** push the score significantly higher.
![SHAP Dot Plot](shap_dot_plot.png)