cloud450 commited on
Commit
a82e372
·
verified ·
1 Parent(s): 8f078dd

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +51 -50
  2. requirements.txt +2 -1
README.md CHANGED
@@ -4,74 +4,75 @@ emoji: 🕳️
4
  colorFrom: yellow
5
  colorTo: red
6
  sdk: gradio
7
- sdk_version: 4.0.0
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
 
10
  ---
11
 
12
- # 🛣️ Pothole Severity Scoring Pipeline
13
 
14
- Active ML pipeline for generating synthetic civic data and training an XGBoost-based regression model to predict pothole severity scores ($S \in [0,1]$).
15
 
16
- ## 🚀 Quick Start
17
 
18
- 1. **Install Dependencies**:
19
- ```bash
20
- pip install numpy pandas scikit-learn xgboost shap matplotlib joblib
21
- ```
22
 
23
- 2. **Run Pipeline**:
24
- ```bash
25
- python severity_model_pipeline.py
26
- ```
27
 
28
- ## 🏗️ Project Structure
29
 
30
- | File | Description |
31
- | :--- | :--- |
32
- | `severity_model_pipeline.py` | Main end-to-end pipeline script. |
33
- | `synthetic_pothole_data.csv` | The generated dataset (10k samples). |
34
- | `severity_model.json` | Trained XGBoost model (Native JSON format). |
35
- | `feature_scaler.pkl` | MinMaxScaler for normalizing real-time features. |
36
- | `feature_list.json` | JSON list ensuring correct feature ordering during inference. |
37
- | `shap_bar_plot.png` | Global feature importance visualization. |
38
- | `shap_dot_plot.png` | Detailed SHAP summary plot showing feature impact. |
39
 
40
- ## 📊 Feature Definitions
 
 
41
 
42
- All features are normalized within the range `[0, 1]`:
43
 
44
- - **A**: Defect area ratio (size relative to image).
45
- - **D**: Defect density (fragmentation level).
46
- - **C**: Centrality (distance from road center).
47
- - **Q**: Detection confidence (CV confidence score).
48
- - **M**: Multi-user confirmation score (crowdsourced weight).
49
- - **T**: Temporal persistence (time since detection).
50
- - **R**: Traffic importance (Highway: 1.0, Main: 0.7, Local: 0.4).
51
- - **P**: Proximity to critical infrastructure (Hospitals, schools).
52
- - **F**: Recurrence frequency (historical patch failure).
53
- - **X**: Resolution failure score (reopen count).
54
 
55
- ## 🧠 Model Logic
56
 
57
- - **Ground Truth Foundation**:
58
- $S_{base} = 0.28A + 0.10D + 0.14C + 0.04Q + 0.08M + 0.07T + 0.09R + 0.10P + 0.06F + 0.04X$
59
- - **Infrastructure Boost**: $K = 1 + 0.5P$
60
- - **Final Target**: $S = \min(1, S_{base} * K + \text{Gaussian Noise})$
61
 
62
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
- ## 🛠️ Inference Usage
 
 
 
 
65
 
66
- You can use the `predict_severity` function within `severity_model_pipeline.py` to get predictions:
67
 
68
- ```python
69
- from severity_model_pipeline import predict_severity, load_inference_artefacts
70
 
71
- # Load trained components
72
- model, scaler, features = load_inference_artefacts()
73
 
74
- # Predict
75
- result = predict_severity(my_data_dict, model, scaler, features)
76
- print(f"Severity: {result['score']} ({result['label']})")
77
- ```
 
4
  colorFrom: yellow
5
  colorTo: red
6
  sdk: gradio
7
+ sdk_version: 4.42.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ tags:
12
+ - xgboost
13
+ - tabular-regression
14
+ - civic-tech
15
+ - pothole-detection
16
  ---
17
 
18
+ # Model Card for Pothole Severity Scoring
19
 
20
+ ## Model Details
21
 
22
+ ### Model Description
23
 
24
+ This is an XGBoost Regressor model designed to predict the priority/severity score of civic infrastructure issues (specifically potholes). It evaluates multiple structural, environmental, and temporal features to output a severity score bounded between 0 and 1, assisting civic authorities in prioritizing repairs and resource allocation.
 
 
 
25
 
26
+ - **Developed by:** Civic AI System (Demo)
27
+ - **Model type:** XGBoost Regressor
28
+ - **License:** MIT
 
29
 
30
+ ## Uses
31
 
32
+ ### Direct Use
 
 
 
 
 
 
 
 
33
 
34
+ The model natively ingests 10 engineered features characterizing a reported pothole and outputs:
35
+ - A numeric severity score ($S \in [0,1]$).
36
+ - A qualitative priority label ("Low", "Medium", "High").
37
 
38
+ This is intended for sorting and prioritizing civil work dispatch queues.
39
 
40
+ ## Bias, Risks, and Limitations
 
 
 
 
 
 
 
 
 
41
 
42
+ The model heavily factors in proximity to critical infrastructure (`P`) and road hierarchy (`R`). While this effectively prioritizes areas like highways and hospitals, it may systematically delay repairs in neglected or local neighborhoods if those areas lack designated local "critical infrastructure". Disparate impact assessments should be run periodically to ensure equitable civic maintenance.
43
 
44
+ ## Training Details
 
 
 
45
 
46
+ ### Training Data
47
+
48
+ The model was trained on a synthetically generated dataset of `10,000` samples designed to mirror realistic distributions of civic reporting. Features include:
49
+ - `A`: Defect area ratio
50
+ - `D`: Defect density
51
+ - `C`: Centrality (distance from center)
52
+ - `Q`: Initial detection confidence
53
+ - `M`: Multi-user confirmation score
54
+ - `T`: Temporal persistence (days unresolved)
55
+ - `R`: Traffic importance tier
56
+ - `P`: Proximity to critical infrastructure
57
+ - `F`: Recurrence frequency
58
+ - `X`: Resolution failure count
59
+
60
+ All features are min-max scaled `[0,1]`.
61
+
62
+ ### Training Procedure
63
 
64
+ - **Algorithm:** XGBoost
65
+ - **Objective:** `reg:squarederror`
66
+ - **Trees:** 200
67
+ - **Max Depth:** 5
68
+ - **Learning Rate:** 0.05
69
 
70
+ ## Evaluation
71
 
72
+ ### Testing Data, Factors & Metrics
 
73
 
74
+ Evaluated on a 20% holdout set (`N=2000`).
 
75
 
76
+ - **RMSE:** 0.0312
77
+ - **MAE:** 0.0247
78
+ - **R² Score:** 0.8067
 
requirements.txt CHANGED
@@ -3,4 +3,5 @@ pandas
3
  scikit-learn
4
  xgboost
5
  joblib
6
- gradio
 
 
3
  scikit-learn
4
  xgboost
5
  joblib
6
+ gradio>=4.42.0
7
+ pyaudioop