techavenger123 commited on
Commit
7e1cdd6
·
1 Parent(s): 394a188

Add Dockerfile and README

Browse files
Files changed (2) hide show
  1. Dockerfile +10 -0
  2. README.md +8 -308
Dockerfile ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+ RUN useradd -m -u 1000 user
3
+ USER user
4
+ ENV PATH="/home/user/.local/bin:"
5
+ WORKDIR /app
6
+ COPY --chown=user requirements.txt requirements.txt
7
+ RUN pip install --no-cache-dir --upgrade -r requirements.txt
8
+ COPY --chown=user . /app
9
+ EXPOSE 7860
10
+ CMD ["gunicorn", "--bind", "0.0.0.0:7860", "--workers", "2", "--timeout", "120", "app:app"]
README.md CHANGED
@@ -1,313 +1,13 @@
1
  ---
2
- title: Fault Detection For Industry
3
- emoji: 🌍
4
- colorFrom: indigo
5
- colorTo: gray
6
  sdk: docker
 
 
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
11
- # FaultSense Industrial Equipment Fault Predictor
12
-
13
- > Real-time binary fault detection for industrial equipment using LightGBM, served via a Flask web application.
14
-
15
- ![Python](https://img.shields.io/badge/Python-3.9%2B-blue)
16
- ![LightGBM](https://img.shields.io/badge/Model-LightGBM-brightgreen)
17
- ![Flask](https://img.shields.io/badge/API-Flask-lightgrey)
18
- ![License](https://img.shields.io/badge/License-MIT-yellow)
19
-
20
- ---
21
-
22
- ## Overview
23
-
24
- FaultSense takes live sensor readings from industrial equipment — temperature, pressure, vibration, and humidity — and predicts in real time whether the equipment is **healthy** or **faulty**. It includes a full ML pipeline from synthetic data generation through hyperparameter search to a production-ready web interface.
25
-
26
- **Equipment types supported:** Pump · Compressor · Motor · Valve · Sensor
27
-
28
- ---
29
-
30
- ## Screenshots
31
-
32
- The web UI lets you drag sensor sliders and get an instant fault prediction with probability score and confidence level.
33
-
34
- > Run the app (see below) and open `http://localhost:5000`
35
-
36
- ---
37
-
38
- ## Project Structure
39
-
40
- ```
41
- FaultSense/
42
-
43
- ├── app.py # Flask web app (main entry point)
44
- ├── app2.py # Alternative app variant
45
-
46
- ├── data_synthesier.py # Synthetic dataset generator
47
- ├── dataset.py # Dataset structuring utilities
48
- ├── distribution_function.py # Sensor feature distribution modelling
49
- ├── data_analyze.py # Exploratory data analysis
50
- ├── data.ipynb # EDA notebook
51
-
52
- ├── main.py → main8.py # Iterative experiment scripts
53
- ├── main9_by_claude.py # Claude-assisted experiment
54
- ├── main10_claude_combnation.py # Dense hyperparameter grid search (~13,650 runs)
55
- ├── main11.py # Final experiment iteration
56
-
57
- ├── synthetic_nim_parallel_10000.csv # Primary training dataset (10,000 samples)
58
- ├── RANDOM_FOREST.csv # Random Forest baseline results
59
- ├── faultsense_model.joblib # Serialised trained pipeline
60
-
61
- ├── results/ # Experiment results (CSV / XLSX)
62
- ├── plots/ # Saved diagnostic plots
63
- ├── analysis/ # Additional analysis outputs
64
- ├── industrial-equipment-monitoring-dataset/ # Raw dataset folder
65
- └── synthetics3/ # Additional synthetic data variants
66
- ```
67
-
68
- ---
69
-
70
- ## Features
71
-
72
- - **Binary fault classification** — predicts `FAULTY` or `HEALTHY` with probability score
73
- - **Confidence levels** — HIGH / MEDIUM / LOW based on prediction probability
74
- - **Live web UI** — interactive sliders for all sensor inputs, dark-mode interface
75
- - **Prediction history** — last 20 predictions shown in-session
76
- - **Model info panel** — displays test AUC, F1, accuracy, precision, and recall live in the UI
77
- - **REST API** — `/predict` endpoint accepts JSON for programmatic use
78
- - **Auto train or load** — automatically retrains if no saved model is found
79
-
80
- ---
81
-
82
- ## Quickstart
83
-
84
- ### 1. Clone the repository
85
-
86
- ```bash
87
- git clone https://github.com/techavenger123/Trial_AI_ProjDATASET.git
88
- cd Trial_AI_ProjDATASET
89
- ```
90
-
91
- ### 2. Install dependencies
92
-
93
- ```bash
94
- pip install -r requirements.txt
95
- ```
96
-
97
- ### 3. Run the app
98
-
99
- ```bash
100
- python app.py
101
- ```
102
-
103
- On first run, if `faultsense_model.joblib` is not present, the model will train automatically using `synthetic_nim_parallel_10000.csv`. This takes under a minute.
104
-
105
- ### 4. Open in browser
106
-
107
- ```
108
- http://localhost:5000
109
- ```
110
-
111
- ---
112
-
113
- ## Requirements
114
-
115
- ```
116
- flask
117
- lightgbm
118
- scikit-learn
119
- pandas
120
- numpy
121
- joblib
122
- matplotlib
123
- tqdm
124
- openpyxl
125
- ```
126
-
127
- Install all at once:
128
-
129
- ```bash
130
- pip install flask lightgbm scikit-learn pandas numpy joblib matplotlib tqdm openpyxl
131
- ```
132
-
133
- Python 3.9 or higher is recommended.
134
-
135
- ---
136
-
137
- ## Dataset
138
-
139
- The primary dataset (`synthetic_nim_parallel_10000.csv`) contains **10,000 synthetic sensor readings** generated in parallel to simulate realistic industrial conditions.
140
-
141
- | Feature | Type | Range | Description |
142
- |---|---|---|---|
143
- | `equipment` | Categorical | pump, compressor, motor, valve, sensor | Equipment type |
144
- | `temperature` | Float | –20 to 120 °C | Operating temperature |
145
- | `pressure` | Float | 0 to 20 bar | Internal pressure |
146
- | `vibration` | Float | 0 to 50 mm/s | Vibration level |
147
- | `humidity` | Float | 0 to 100 % | Ambient humidity |
148
- | `location` | Categorical | — | Installation location (dropped at training) |
149
- | `faulty` | Binary | 0 / 1 | **Target** — 0 = healthy, 1 = faulty |
150
-
151
- Class imbalance is handled via `class_weight="balanced"` in the LightGBM classifier.
152
-
153
- ---
154
-
155
- ## Model
156
-
157
- FaultSense uses a **scikit-learn Pipeline** combining preprocessing and a LightGBM classifier.
158
-
159
- ### Architecture
160
-
161
- ```
162
- Input features
163
-
164
- ├── equipment (categorical) ──► OneHotEncoder
165
- └── temperature, pressure,
166
- vibration, humidity (numeric) ──► passthrough
167
-
168
-
169
- LGBMClassifier
170
-
171
-
172
- Fault probability [0–1]
173
-
174
- threshold = 0.5
175
-
176
- FAULTY (1) / HEALTHY (0)
177
- ```
178
-
179
- ### Best configuration
180
-
181
- | Parameter | Value |
182
- |---|---|
183
- | Learning rate | 0.05 |
184
- | n_estimators | 165 |
185
- | max_depth | 8 |
186
- | num_leaves | 50 |
187
- | subsample | 0.8 |
188
- | colsample_bytree | 0.8 |
189
- | Train / Val / Test split | 90% / 5% / 5% |
190
- | Prediction threshold | 0.5 |
191
-
192
- This configuration was selected from a dense grid search of **~13,650 combinations** across 35 learning rates, 78 estimator counts, and 5 train/val/test split ratios (see `main10_claude_combnation.py`).
193
-
194
- ---
195
-
196
- ## API Reference
197
-
198
- ### `POST /predict`
199
-
200
- Predict fault status from sensor readings.
201
-
202
- **Request body (JSON)**
203
-
204
- ```json
205
- {
206
- "equipment": "pump",
207
- "temperature": 75.5,
208
- "pressure": 12.3,
209
- "vibration": 18.0,
210
- "humidity": 65
211
- }
212
- ```
213
-
214
- **Response**
215
-
216
- ```json
217
- {
218
- "prediction": 1,
219
- "probability": 0.8732,
220
- "confidence": "HIGH",
221
- "threshold": 0.5,
222
- "label": "FAULTY"
223
- }
224
- ```
225
-
226
- ### `GET /model_info`
227
-
228
- Returns the current model configuration and test-set performance metrics.
229
-
230
- **Response**
231
-
232
- ```json
233
- {
234
- "config": {
235
- "learning_rate": 0.05,
236
- "n_estimators": 165,
237
- "train_ratio": 0.9,
238
- "val_ratio": 0.05,
239
- "test_ratio": 0.05
240
- },
241
- "test_metrics": {
242
- "test_auc": 0.97,
243
- "test_accuracy": 0.94,
244
- "test_f1": 0.93,
245
- "test_precision": 0.91,
246
- "test_recall": 0.95,
247
- "test_logloss": 0.18
248
- }
249
- }
250
- ```
251
-
252
- ---
253
-
254
- ## Running the Hyperparameter Search
255
-
256
- To reproduce the full grid search (warning: this takes significant time — ~13,650 model fits):
257
-
258
- ```bash
259
- python main10_claude_combnation.py
260
- ```
261
-
262
- Results are saved to `results/synthetic/dense_results.csv` and `dense_results.xlsx`. Six diagnostic plots are saved to `Synthetic1/synthetic_plot/`:
263
-
264
- - Validation metric heatmaps (LR × n_estimators)
265
- - Metrics vs n_estimators per split ratio
266
- - Metrics vs learning rate per split ratio
267
- - Train vs validation curves (best split)
268
- - Overfitting heatmap (train AUC − val AUC)
269
- - Top-30 config scatter (val F1 vs val AUC)
270
-
271
- The search supports **checkpointing** — if interrupted, it resumes from where it left off.
272
-
273
- ---
274
-
275
- ## Retrain from Scratch
276
-
277
- To force a full retrain (ignoring any saved model):
278
-
279
- ```bash
280
- # Delete the saved model, then run the app
281
- rm faultsense_model.joblib
282
- python app.py
283
- ```
284
-
285
- Or edit `BEST_CONFIG` in `app.py` to change hyperparameters before retraining.
286
-
287
- ---
288
-
289
- ## Known Limitations
290
-
291
- - **Synthetic data only** — the model has not been validated on real industrial sensor readings. Performance may differ on real-world data.
292
- - **Fixed threshold** — the prediction threshold is set to 0.5. For safety-critical applications, consider tuning this using a precision-recall curve to favour recall (catching more faults at the cost of more false alarms).
293
- - **No feature explainability** — the app does not currently show which sensor reading drove a given prediction. Adding SHAP values would improve interpretability for maintenance engineers.
294
- - **No authentication** — the Flask app runs without any access control. Do not expose it publicly without adding authentication.
295
- - **Single model** — only LightGBM is deployed. Ensemble approaches or periodic retraining on fresh data may improve production reliability.
296
-
297
- ---
298
-
299
- ## Development History
300
-
301
- This project was built iteratively, with experiment scripts versioned as `main.py` through `main11.py`. Scripts `main9_by_claude.py` and `main10_claude_combnation.py` reflect AI-assisted development using Claude.
302
-
303
- ---
304
-
305
- ## License
306
-
307
- MIT License. See `LICENSE` for details.
308
-
309
- ---
310
-
311
- ## Contributing
312
-
313
- Pull requests are welcome. For significant changes, please open an issue first to discuss what you would like to change.
 
1
  ---
2
+ title: FaultSense
3
+ emoji: ??
4
+ colorFrom: green
5
+ colorTo: blue
6
  sdk: docker
7
+ app_file: app.py
8
+ app_port: 7860
9
  pinned: false
10
  ---
11
 
12
+ # FaultSense Industrial Equipment Fault Predictor
13
+ Real-time binary fault detection using LightGBM and Flask.