Spaces:

cloud450
/

Severity_Score

Sleeping

App Files Files Community

cloud450 commited on Apr 11

Commit

a82e372

verified ·

1 Parent(s): 8f078dd

Upload 2 files

Browse files

Files changed (2) hide show

README.md +51 -50
requirements.txt +2 -1

README.md CHANGED Viewed

@@ -4,74 +4,75 @@ emoji: 🕳️
 colorFrom: yellow
 colorTo: red
 sdk: gradio
-sdk_version: 4.0.0
 app_file: app.py
 pinned: false
 ---
-# 🛣️ Pothole Severity Scoring Pipeline
-Active ML pipeline for generating synthetic civic data and training an XGBoost-based regression model to predict pothole severity scores ($S \in [0,1]$).
-## 🚀 Quick Start
-1. **Install Dependencies**:
-   ```bash
-   pip install numpy pandas scikit-learn xgboost shap matplotlib joblib
-   ```
-2. **Run Pipeline**:
-   ```bash
-   python severity_model_pipeline.py
-   ```
-## 🏗️ Project Structure
-| File | Description |
-| :--- | :--- |
-| `severity_model_pipeline.py` | Main end-to-end pipeline script. |
-| `synthetic_pothole_data.csv` | The generated dataset (10k samples). |
-| `severity_model.json` | Trained XGBoost model (Native JSON format). |
-| `feature_scaler.pkl` | MinMaxScaler for normalizing real-time features. |
-| `feature_list.json` | JSON list ensuring correct feature ordering during inference. |
-| `shap_bar_plot.png` | Global feature importance visualization. |
-| `shap_dot_plot.png` | Detailed SHAP summary plot showing feature impact. |
-## 📊 Feature Definitions
-All features are normalized within the range `[0, 1]`:
-- **A**: Defect area ratio (size relative to image).
-- **D**: Defect density (fragmentation level).
-- **C**: Centrality (distance from road center).
-- **Q**: Detection confidence (CV confidence score).
-- **M**: Multi-user confirmation score (crowdsourced weight).
-- **T**: Temporal persistence (time since detection).
-- **R**: Traffic importance (Highway: 1.0, Main: 0.7, Local: 0.4).
-- **P**: Proximity to critical infrastructure (Hospitals, schools).
-- **F**: Recurrence frequency (historical patch failure).
-- **X**: Resolution failure score (reopen count).
-## 🧠 Model Logic
-- **Ground Truth Foundation**:
-  $S_{base} = 0.28A + 0.10D + 0.14C + 0.04Q + 0.08M + 0.07T + 0.09R + 0.10P + 0.06F + 0.04X$
-- **Infrastructure Boost**: $K = 1 + 0.5P$
-- **Final Target**: $S = \min(1, S_{base} * K + \text{Gaussian Noise})$
----
-## 🛠️ Inference Usage
-You can use the `predict_severity` function within `severity_model_pipeline.py` to get predictions:
-```python
-from severity_model_pipeline import predict_severity, load_inference_artefacts
-# Load trained components
-model, scaler, features = load_inference_artefacts()
-# Predict
-result = predict_severity(my_data_dict, model, scaler, features)
-print(f"Severity: {result['score']} ({result['label']})")
-```

 colorFrom: yellow
 colorTo: red
 sdk: gradio
+sdk_version: 4.42.0
 app_file: app.py
 pinned: false
+license: mit
+tags:
+- xgboost
+- tabular-regression
+- civic-tech
+- pothole-detection
 ---
+# Model Card for Pothole Severity Scoring
+## Model Details
+### Model Description
+This is an XGBoost Regressor model designed to predict the priority/severity score of civic infrastructure issues (specifically potholes). It evaluates multiple structural, environmental, and temporal features to output a severity score bounded between 0 and 1, assisting civic authorities in prioritizing repairs and resource allocation.
+- **Developed by:** Civic AI System (Demo)
+- **Model type:** XGBoost Regressor
+- **License:** MIT
+## Uses
+### Direct Use
+The model natively ingests 10 engineered features characterizing a reported pothole and outputs:
+- A numeric severity score ($S \in [0,1]$).
+- A qualitative priority label ("Low", "Medium", "High").
+This is intended for sorting and prioritizing civil work dispatch queues.
+## Bias, Risks, and Limitations
+The model heavily factors in proximity to critical infrastructure (`P`) and road hierarchy (`R`). While this effectively prioritizes areas like highways and hospitals, it may systematically delay repairs in neglected or local neighborhoods if those areas lack designated local "critical infrastructure". Disparate impact assessments should be run periodically to ensure equitable civic maintenance.
+## Training Details
+### Training Data
+The model was trained on a synthetically generated dataset of `10,000` samples designed to mirror realistic distributions of civic reporting. Features include:
+- `A`: Defect area ratio
+- `D`: Defect density
+- `C`: Centrality (distance from center)
+- `Q`: Initial detection confidence
+- `M`: Multi-user confirmation score
+- `T`: Temporal persistence (days unresolved)
+- `R`: Traffic importance tier
+- `P`: Proximity to critical infrastructure
+- `F`: Recurrence frequency
+- `X`: Resolution failure count
+All features are min-max scaled `[0,1]`.
+### Training Procedure
+- **Algorithm:** XGBoost
+- **Objective:** `reg:squarederror`
+- **Trees:** 200
+- **Max Depth:** 5
+- **Learning Rate:** 0.05
+## Evaluation
+### Testing Data, Factors & Metrics
+Evaluated on a 20% holdout set (`N=2000`).
+- **RMSE:** 0.0312
+- **MAE:** 0.0247
+- **R² Score:** 0.8067

requirements.txt CHANGED Viewed

@@ -3,4 +3,5 @@ pandas
 scikit-learn
 xgboost
 joblib
-gradio

 scikit-learn
 xgboost
 joblib
+gradio>=4.42.0
+pyaudioop