baiganinn commited on
Commit
7058515
·
0 Parent(s):
.gitattributes ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ *.pkl filter=lfs diff=lfs merge=lfs -text
2
+ *.pth filter=lfs diff=lfs merge=lfs -text
3
+ *.joblib filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🛣️ Vehicle Anomaly Detection System
2
+
3
+ An advanced machine learning-powered anomaly detection system for GPS tracking data with a beautiful Gradio interface.
4
+
5
+ ## 🚀 Features
6
+
7
+ - **Multiple ML Models**: Ensemble of Isolation Forest, One-Class SVM, and LSTM Autoencoder
8
+ - **Beautiful UI**: Modern Gradio interface with interactive visualizations
9
+ - **Real-time Processing**: Handles up to 2000 GPS points with detailed analysis
10
+ - **Comprehensive Output**: Point-by-point analysis, risk factors, and JSON export
11
+ - **Interactive Maps**: GPS route visualization with anomaly highlighting
12
+ - **Performance Analytics**: Speed, altitude, and confidence distribution charts
13
+
14
+ ## 📊 Processing Performance
15
+
16
+ - **CPU-only processing**: 45-90 seconds for 2000 samples
17
+ - **HuggingFace Spaces ready**: Optimized for cloud deployment
18
+ - **Memory efficient**: Handles large datasets with rolling window processing
19
+
20
+ ## 🔧 Installation
21
+
22
+ ### Local Installation
23
+
24
+ ```bash
25
+ # Clone or download the project
26
+ cd anomaly
27
+
28
+ # Install dependencies
29
+ pip install -r requirements.txt
30
+
31
+ # Run the Gradio app
32
+ python gradio_app.py
33
+ ```
34
+
35
+ ### HuggingFace Spaces Deployment
36
+
37
+ 1. Create a new Space on HuggingFace
38
+ 2. Upload all files including the `models/` directory
39
+ 3. Set `app_file` to `gradio_app.py`
40
+ 4. The app will automatically launch
41
+
42
+ ## 📁 Input Format
43
+
44
+ Your CSV file must contain these columns:
45
+
46
+ | Column | Description | Range |
47
+ |--------|-------------|-------|
48
+ | `randomized_id` | Vehicle identifier | Any string |
49
+ | `lat` | Latitude | -90 to 90 |
50
+ | `lng` | Longitude | -180 to 180 |
51
+ | `spd` | Speed (km/h) | 0 to 300 |
52
+ | `azm` | Azimuth/heading (degrees) | 0 to 360 |
53
+ | `alt` | Altitude (meters) | Any number |
54
+
55
+ ### Sample Data
56
+
57
+ ```csv
58
+ randomized_id,lat,lng,spd,azm,alt
59
+ VEHICLE001,40.7128,-74.0060,45.5,90.0,100.0
60
+ VEHICLE001,40.7138,-74.0070,48.2,92.0,102.0
61
+ VEHICLE002,40.7500,-73.9800,35.2,180.0,90.0
62
+ ```
63
+
64
+ **Maximum**: 2000 samples per upload
65
+ **Minimum**: 5 samples required
66
+
67
+ ## 🎯 Anomaly Detection
68
+
69
+ The system detects various types of anomalies:
70
+
71
+ ### Speed Anomalies
72
+ - Excessive speeding (>120 km/h)
73
+ - Sudden acceleration/deceleration
74
+ - Speed inconsistencies
75
+
76
+ ### Movement Anomalies
77
+ - Erratic GPS patterns
78
+ - Sharp turns at high speed
79
+ - Altitude inconsistencies
80
+
81
+ ### Behavioral Patterns
82
+ - Route deviations
83
+ - Stop-and-go patterns
84
+ - Unusual driving sequences
85
+
86
+ ## 📈 Output Features
87
+
88
+ ### 1. Detailed Results
89
+ - Point-by-point analysis
90
+ - Normal vs. anomaly classification
91
+ - Confidence scores and alert levels
92
+ - Risk factor identification
93
+
94
+ ### 2. Interactive Visualizations
95
+ - GPS route mapping with anomaly markers
96
+ - Speed and altitude profiles
97
+ - Confidence score distributions
98
+ - Multi-panel analysis dashboard
99
+
100
+ ### 3. Summary Statistics
101
+ - Processing performance metrics
102
+ - Overall anomaly rates
103
+ - Alert level distributions
104
+ - Risk factor rankings
105
+
106
+ ### 4. JSON Export
107
+ Complete machine-readable results including:
108
+ - All detection scores
109
+ - Driving metrics
110
+ - Risk assessments
111
+ - Timestamps and metadata
112
+
113
+ ## 🔬 Technical Details
114
+
115
+ ### ML Models Used
116
+ 1. **Isolation Forest**: Tree-based anomaly detection
117
+ 2. **One-Class SVM**: Support vector-based outlier detection
118
+ 3. **LSTM Autoencoder**: Deep learning sequence anomaly detection
119
+
120
+ ### Feature Engineering
121
+ - 18 engineered features including:
122
+ - Speed patterns and statistics
123
+ - Acceleration and jerk calculations
124
+ - Angular velocity and curvature
125
+ - Rolling window aggregations
126
+ - Risk scoring algorithms
127
+
128
+ ### Performance Optimization
129
+ - Efficient batch processing
130
+ - Memory-optimized feature calculation
131
+ - CPU-friendly model inference
132
+ - Progressive result streaming
133
+
134
+ ## 🛡️ Privacy & Security
135
+
136
+ - **Local Processing**: All analysis happens in your environment
137
+ - **No Data Upload**: Your GPS data never leaves the system
138
+ - **Real-time Analysis**: No data storage or logging
139
+ - **Secure Processing**: Industry-standard ML pipeline
140
+
141
+ ## 🚀 Deployment Options
142
+
143
+ ### Local Development
144
+ ```bash
145
+ python gradio_app.py
146
+ # Access at http://localhost:7860
147
+ ```
148
+
149
+ ### HuggingFace Spaces
150
+ - Perfect for sharing and collaboration
151
+ - No setup required
152
+ - Automatic scaling
153
+ - Public or private deployment
154
+
155
+ ### Docker (Optional)
156
+ ```dockerfile
157
+ FROM python:3.9-slim
158
+ COPY . /app
159
+ WORKDIR /app
160
+ RUN pip install -r requirements.txt
161
+ CMD ["python", "gradio_app.py"]
162
+ ```
163
+
164
+ ## 📞 Support
165
+
166
+ For issues or questions:
167
+ 1. Check the sample data format
168
+ 2. Ensure your CSV has all required columns
169
+ 3. Verify data is within expected ranges
170
+ 4. Check for missing values or invalid entries
171
+
172
+ ## 🔮 Future Enhancements
173
+
174
+ - Real-time streaming support
175
+ - Custom alert thresholds
176
+ - Historical trend analysis
177
+ - Fleet management dashboard
178
+ - Advanced route optimization
179
+ - Multi-vehicle correlation analysis
180
+
181
+ ---
182
+
183
+ **Made with ❤️ using Gradio, PyTorch, and Advanced ML**
__pycache__/batch_production_pred.cpython-311.pyc ADDED
Binary file (26.4 kB). View file
 
__pycache__/gradio_app.cpython-311.pyc ADDED
Binary file (30.1 kB). View file
 
__pycache__/production_predictor.cpython-311.pyc ADDED
Binary file (34.8 kB). View file
 
app.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ HuggingFace Spaces entry point for Vehicle Anomaly Detection System
4
+ """
5
+
6
+ from gradio_app import create_interface
7
+
8
+ if __name__ == "__main__":
9
+ demo = create_interface()
10
+ demo.launch(share=True, server_name="0.0.0.0", server_port=7860, debug=True)
batch_production_pred.py ADDED
@@ -0,0 +1,484 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import pandas as pd
3
+ from typing import List, Dict, Optional, Tuple, Any
4
+ from datetime import datetime, timedelta
5
+ import logging
6
+ from production_predictor import ProductionAnomalyDetector, AnomalyResult, GPSPoint
7
+ import torch
8
+ logger = logging.getLogger(__name__)
9
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
10
+ class BatchAnomalyDetector(ProductionAnomalyDetector):
11
+ """
12
+ Extended ProductionAnomalyDetector with batch processing capabilities
13
+ Processes data as list of lists: [[id, lat, lng, azm, spd, alt], ...]
14
+ """
15
+
16
+ def __init__(self, model_dir: str, config: Dict = None):
17
+ super().__init__(model_dir, config)
18
+ self.batch_results = []
19
+
20
+ def process_batch_list_of_lists(self,
21
+ data: List[List],
22
+ column_order: List[str] = None,
23
+ sort_by_vehicle: bool = True,
24
+ generate_timestamps: bool = True) -> Dict[str, Any]:
25
+ """
26
+ Process batch data as list of lists
27
+
28
+ Args:
29
+ data: List of lists in format [[id, lat, lng, azm, spd, alt], ...]
30
+ column_order: Order of columns if different from default
31
+ sort_by_vehicle: Whether to sort by vehicle_id for proper sequence
32
+ generate_timestamps: Whether to generate timestamps automatically
33
+
34
+ Returns:
35
+ Dictionary with batch processing results
36
+ """
37
+
38
+ if column_order is None:
39
+ column_order = ['vehicle_id', 'lat', 'lng', 'azm', 'spd', 'alt']
40
+
41
+ print(f"🔄 Processing batch of {len(data)} GPS points...")
42
+
43
+ # Convert list of lists to DataFrame
44
+ df = pd.DataFrame(data, columns=column_order)
45
+
46
+ # Rename to match your training format
47
+ column_mapping = {
48
+ 'vehicle_id': 'randomized_id',
49
+ 'azm': 'azm',
50
+ 'spd': 'spd',
51
+ 'alt': 'alt',
52
+ 'lat': 'lat',
53
+ 'lng': 'lng'
54
+ }
55
+
56
+ # Apply column mapping if needed
57
+ for old_col, new_col in column_mapping.items():
58
+ if old_col in df.columns and old_col != new_col:
59
+ df = df.rename(columns={old_col: new_col})
60
+
61
+ # Ensure we have the right columns
62
+ required_columns = ['randomized_id', 'lat', 'lng', 'alt', 'spd', 'azm']
63
+ missing_columns = [col for col in required_columns if col not in df.columns]
64
+
65
+ if missing_columns:
66
+ raise ValueError(f"Missing required columns: {missing_columns}")
67
+
68
+ # Sort by vehicle and add sequence if requested
69
+ if sort_by_vehicle:
70
+ df = df.sort_values(['randomized_id', 'lat', 'lng']).reset_index(drop=True)
71
+
72
+ # Generate timestamps if requested
73
+ if generate_timestamps:
74
+ df['timestamp'] = self._generate_timestamps(df)
75
+
76
+ # Process batch
77
+ return self._process_dataframe_batch(df)
78
+
79
+ def process_batch_by_vehicle(self,
80
+ data: List[List],
81
+ column_order: List[str] = None,
82
+ time_interval_seconds: int = 2) -> Dict[str, List[AnomalyResult]]:
83
+ """
84
+ Process batch data vehicle by vehicle to maintain proper sequence
85
+
86
+ Args:
87
+ data: List of lists format
88
+ column_order: Column order specification
89
+ time_interval_seconds: Time interval between GPS points
90
+
91
+ Returns:
92
+ Dictionary with vehicle_id as key and list of results as value
93
+ """
94
+
95
+ if column_order is None:
96
+ column_order = ['vehicle_id', 'lat', 'lng', 'azm', 'spd', 'alt']
97
+
98
+ # Convert to DataFrame
99
+ df = pd.DataFrame(data, columns=column_order)
100
+
101
+ # Group by vehicle
102
+ vehicle_results = {}
103
+ total_anomalies = 0
104
+
105
+ print(f"🚛 Processing {df['vehicle_id'].nunique()} vehicles with {len(df)} total points...")
106
+
107
+ for vehicle_id in df['vehicle_id'].unique():
108
+ vehicle_data = df[df['vehicle_id'] == vehicle_id].copy()
109
+ vehicle_data = vehicle_data.sort_values(['lat', 'lng']).reset_index(drop=True)
110
+
111
+ print(f"\n📍 Processing vehicle: {vehicle_id} ({len(vehicle_data)} points)")
112
+
113
+ # Clear vehicle buffer to start fresh
114
+ if vehicle_id in self.vehicle_buffers:
115
+ del self.vehicle_buffers[vehicle_id]
116
+
117
+ vehicle_results[vehicle_id] = []
118
+ vehicle_anomalies = 0
119
+
120
+ # Process points sequentially for this vehicle
121
+ for idx, row in vehicle_data.iterrows():
122
+ timestamp = datetime.now() + timedelta(seconds=idx * time_interval_seconds)
123
+
124
+ gps_point = GPSPoint(
125
+ vehicle_id=vehicle_id,
126
+ lat=row['lat'],
127
+ lng=row['lng'],
128
+ alt=row.get('alt', 0.0),
129
+ spd=row.get('spd', 0.0),
130
+ azm=row.get('azm', 0.0),
131
+ timestamp=timestamp.isoformat()
132
+ )
133
+
134
+ result = self.process_gps_point(gps_point)
135
+
136
+ if result:
137
+ vehicle_results[vehicle_id].append(result)
138
+ if result.anomaly_detected:
139
+ vehicle_anomalies += 1
140
+ total_anomalies += 1
141
+
142
+ # Print anomaly details
143
+ print(f" 🚨 Point {idx+1}: {result.alert_level} "
144
+ f"(Speed: {result.driving_metrics['speed']:.1f} km/h, "
145
+ f"Conf: {result.confidence:.3f})")
146
+ print(f" Risk factors: {result.risk_factors}")
147
+
148
+ detection_rate = vehicle_anomalies / len(vehicle_results[vehicle_id]) if vehicle_results[vehicle_id] else 0
149
+ print(f" 📊 Vehicle summary: {vehicle_anomalies} anomalies out of {len(vehicle_results[vehicle_id])} detections ({detection_rate:.1%})")
150
+
151
+ print(f"\n🎯 Batch Summary:")
152
+ print(f" Total vehicles: {len(vehicle_results)}")
153
+ print(f" Total points processed: {len(df)}")
154
+ print(f" Total anomalies detected: {total_anomalies}")
155
+ print(f" Overall anomaly rate: {total_anomalies/len(df):.1%}")
156
+
157
+ return vehicle_results
158
+
159
+ def process_realtime_stream(self, data_stream: List[List],
160
+ column_order: List[str] = None,
161
+ delay_seconds: float = 2.0,
162
+ callback_function = None) -> List[AnomalyResult]:
163
+ """
164
+ Simulate real-time processing of list-of-lists data
165
+
166
+ Args:
167
+ data_stream: List of lists to process as real-time stream
168
+ column_order: Column order
169
+ delay_seconds: Delay between processing points (simulate real-time)
170
+ callback_function: Function to call when anomaly is detected
171
+
172
+ Returns:
173
+ List of all detection results
174
+ """
175
+
176
+ import time
177
+
178
+ if column_order is None:
179
+ column_order = ['vehicle_id', 'lat', 'lng', 'azm', 'spd', 'alt']
180
+
181
+ print(f"🔴 Starting real-time stream simulation with {len(data_stream)} points...")
182
+ print(f"⏱️ Processing delay: {delay_seconds} seconds between points")
183
+
184
+ all_results = []
185
+ anomaly_count = 0
186
+
187
+ for i, point_data in enumerate(data_stream):
188
+ # Convert list to GPSPoint
189
+ point_dict = dict(zip(column_order, point_data))
190
+
191
+ gps_point = GPSPoint(
192
+ vehicle_id=point_dict['vehicle_id'],
193
+ lat=point_dict['lat'],
194
+ lng=point_dict['lng'],
195
+ alt=point_dict.get('alt', 0.0),
196
+ spd=point_dict.get('spd', 0.0),
197
+ azm=point_dict.get('azm', 0.0),
198
+ timestamp=datetime.now().isoformat()
199
+ )
200
+
201
+ # Process point
202
+ result = self.process_gps_point(gps_point)
203
+
204
+ if result:
205
+ all_results.append(result)
206
+
207
+ # Print status
208
+ status_icon = "🟢" if result.alert_level == "NORMAL" else "🟡" if result.alert_level in ["LOW", "MEDIUM"] else "🔴"
209
+ print(f"{status_icon} Point {i+1:3d}: {result.vehicle_id:12s} | "
210
+ f"{result.alert_level:8s} | Speed: {result.driving_metrics['speed']:5.1f} km/h | "
211
+ f"Conf: {result.confidence:.3f}")
212
+
213
+ if result.anomaly_detected:
214
+ anomaly_count += 1
215
+ print(f" 🚨 ANOMALY DETECTED! {result.risk_factors}")
216
+
217
+ # Call callback function if provided
218
+ if callback_function:
219
+ callback_function(result, gps_point)
220
+ else:
221
+ print(f"⏳ Point {i+1:3d}: {point_dict['vehicle_id']:12s} | Building buffer...")
222
+
223
+ # Simulate real-time delay
224
+ if i < len(data_stream) - 1: # Don't delay after last point
225
+ time.sleep(delay_seconds)
226
+
227
+ print(f"\n📊 Stream Complete:")
228
+ print(f" Points processed: {len(data_stream)}")
229
+ print(f" Detections made: {len(all_results)}")
230
+ print(f" Anomalies found: {anomaly_count}")
231
+ print(f" Anomaly rate: {anomaly_count/len(all_results)*100:.1f}%" if all_results else " No detections made")
232
+
233
+ return all_results
234
+
235
+ def _generate_timestamps(self, df: pd.DataFrame) -> List[str]:
236
+ """Generate realistic timestamps for GPS data"""
237
+ base_time = datetime.now()
238
+ timestamps = []
239
+
240
+ for vehicle_id in df['randomized_id'].unique():
241
+ vehicle_mask = df['randomized_id'] == vehicle_id
242
+ vehicle_count = vehicle_mask.sum()
243
+
244
+ # Generate timestamps for this vehicle (2-second intervals)
245
+ for i in range(vehicle_count):
246
+ timestamp = base_time + timedelta(seconds=i * 2)
247
+ timestamps.append(timestamp.isoformat())
248
+
249
+ return timestamps
250
+
251
+ def _process_dataframe_batch(self, df: pd.DataFrame) -> Dict[str, Any]:
252
+ """Process DataFrame using the existing feature pipeline"""
253
+
254
+ # Use your exact feature engineering pipeline
255
+ features_df = self._calculate_features_exact_pipeline(df)
256
+
257
+ if len(features_df) == 0:
258
+ return {
259
+ "status": "error",
260
+ "message": "No features could be calculated",
261
+ "processed": 0,
262
+ "anomalies": 0
263
+ }
264
+
265
+ # Scale features
266
+ features_scaled = self.scaler.transform(features_df)
267
+
268
+ # Get anomaly scores for all points
269
+ anomaly_results = []
270
+
271
+ print("🔍 Running anomaly detection on all points...")
272
+
273
+ for i in range(len(features_scaled)):
274
+ point_scaled = features_scaled[i:i+1]
275
+
276
+ # Get scores from all models
277
+ scores = {}
278
+
279
+ # Isolation Forest
280
+ if self.isolation_forest:
281
+ scores['isolation_forest'] = float(self.isolation_forest.decision_function(point_scaled)[0])
282
+
283
+ # One-Class SVM
284
+ if self.one_class_svm:
285
+ scores['one_class_svm'] = float(self.one_class_svm.decision_function(point_scaled)[0])
286
+
287
+ # LSTM (only if we have enough sequence data)
288
+ if self.lstm_autoencoder and i >= self.config['lstm_sequence_length'] - 1:
289
+ try:
290
+ sequence_start = max(0, i - self.config['lstm_sequence_length'] + 1)
291
+ sequence_features = features_scaled[sequence_start:i+1]
292
+
293
+ if len(sequence_features) == self.config['lstm_sequence_length']:
294
+ sequence_tensor = torch.FloatTensor(sequence_features).unsqueeze(0).to(device)
295
+
296
+ with torch.no_grad():
297
+ reconstructed = self.lstm_autoencoder(sequence_tensor)
298
+ reconstruction_error = torch.mean((sequence_tensor - reconstructed) ** 2).item()
299
+ scores['lstm'] = float(reconstruction_error)
300
+ except:
301
+ scores['lstm'] = 0.0
302
+
303
+ # Calculate ensemble score
304
+ ensemble_score = self._calculate_ensemble_score(scores)
305
+ alert_level = self._get_alert_level(ensemble_score)
306
+
307
+ # Extract metrics
308
+ feature_row = features_df.iloc[i]
309
+ driving_metrics = self._extract_driving_metrics_from_features(feature_row)
310
+ risk_factors = self._extract_risk_factors_from_features(feature_row)
311
+
312
+ anomaly_results.append({
313
+ 'index': i,
314
+ 'vehicle_id': df.iloc[i]['randomized_id'],
315
+ 'anomaly_detected': ensemble_score > self.config['alert_threshold'],
316
+ 'confidence': ensemble_score,
317
+ 'alert_level': alert_level,
318
+ 'raw_scores': scores,
319
+ 'driving_metrics': driving_metrics,
320
+ 'risk_factors': risk_factors
321
+ })
322
+
323
+ # Generate summary
324
+ total_anomalies = sum(1 for r in anomaly_results if r['anomaly_detected'])
325
+
326
+ return {
327
+ "status": "completed",
328
+ "processed": len(anomaly_results),
329
+ "anomalies": total_anomalies,
330
+ "anomaly_rate": total_anomalies / len(anomaly_results) if anomaly_results else 0,
331
+ "results": anomaly_results,
332
+ "summary": {
333
+ "total_vehicles": df['randomized_id'].nunique(),
334
+ "total_points": len(df),
335
+ "detection_ready_points": len(anomaly_results),
336
+ "anomalies_by_level": {
337
+ level: sum(1 for r in anomaly_results if r['alert_level'] == level)
338
+ for level in ['NORMAL', 'LOW', 'MEDIUM', 'HIGH', 'CRITICAL']
339
+ }
340
+ }
341
+ }
342
+
343
+ # Example usage functions
344
+ def example_list_of_lists_usage():
345
+ """Example of how to use the batch processor with list of lists"""
346
+
347
+ print("🔄 Example: Processing List of Lists Data")
348
+ print("=" * 50)
349
+
350
+ # Initialize batch detector
351
+ detector = BatchAnomalyDetector("/kaggle/working/anomaly_analysis_pytorch_fixed/models")
352
+
353
+ # Sample data as list of lists: [vehicle_id, lat, lng, azm, spd, alt]
354
+ sample_data = [
355
+ # Normal driving for vehicle_001
356
+ ["vehicle_001", 55.7558, 37.6176, 90.0, 45.0, 156.0],
357
+ ["vehicle_001", 55.7559, 37.6177, 92.0, 47.0, 157.0],
358
+ ["vehicle_001", 55.7560, 37.6178, 94.0, 46.0, 158.0],
359
+ ["vehicle_001", 55.7561, 37.6179, 96.0, 48.0, 159.0],
360
+ ["vehicle_001", 55.7562, 37.6180, 98.0, 49.0, 160.0],
361
+
362
+ # Aggressive driving for vehicle_002
363
+ ["vehicle_002", 55.7600, 37.6200, 180.0, 70.0, 150.0],
364
+ ["vehicle_002", 55.7601, 37.6201, 182.0, 125.0, 151.0], # Speeding
365
+ ["vehicle_002", 55.7602, 37.6202, 184.0, 15.0, 152.0], # Hard braking
366
+ ["vehicle_002", 55.7603, 37.6203, 250.0, 55.0, 153.0], # Sharp turn
367
+
368
+ # Mixed behavior for vehicle_003
369
+ ["vehicle_003", 55.7700, 37.6300, 45.0, 40.0, 145.0],
370
+ ["vehicle_003", 55.7701, 37.6301, 47.0, 42.0, 146.0],
371
+ ["vehicle_003", 55.7702, 37.6302, 49.0, 110.0, 147.0], # Speed violation
372
+ ["vehicle_003", 55.7703, 37.6303, 51.0, 43.0, 148.0],
373
+ ]
374
+
375
+ print(f"Processing {len(sample_data)} GPS points from {len(set(row[0] for row in sample_data))} vehicles...")
376
+
377
+ # Method 1: Process as batch
378
+ print("\n📊 Method 1: Batch Processing")
379
+ batch_results = detector.process_batch_list_of_lists(sample_data)
380
+
381
+ print(f"Batch Results:")
382
+ print(f" Status: {batch_results['status']}")
383
+ print(f" Points processed: {batch_results['processed']}")
384
+ print(f" Anomalies detected: {batch_results['anomalies']}")
385
+ print(f" Anomaly rate: {batch_results['anomaly_rate']:.1%}")
386
+
387
+ # Method 2: Process by vehicle
388
+ print("\n🚛 Method 2: Vehicle-by-Vehicle Processing")
389
+ vehicle_results = detector.process_batch_by_vehicle(sample_data)
390
+
391
+ for vehicle_id, results in vehicle_results.items():
392
+ anomaly_count = sum(1 for r in results if r.anomaly_detected)
393
+ print(f" {vehicle_id}: {anomaly_count} anomalies out of {len(results)} detections")
394
+
395
+ # Method 3: Real-time simulation
396
+ print("\n🔴 Method 3: Real-time Stream Simulation (first 8 points)")
397
+
398
+ def anomaly_callback(result, gps_point):
399
+ """Callback function for when anomaly is detected"""
400
+ print(f" 📧 ALERT SENT: {result.vehicle_id} - {result.alert_level}")
401
+
402
+ stream_results = detector.process_realtime_stream(
403
+ sample_data[:8], # First 8 points
404
+ delay_seconds=0.5, # Faster for demo
405
+ callback_function=anomaly_callback
406
+ )
407
+
408
+ def load_from_csv_example():
409
+ """Example of loading data from CSV and converting to list of lists"""
410
+
411
+ print("\n📁 Example: Loading from CSV")
412
+ print("=" * 50)
413
+
414
+ # Simulate CSV loading (you would use pd.read_csv('your_file.csv'))
415
+ csv_data = """vehicle_id,lat,lng,azm,spd,alt
416
+ vehicle_001,55.7558,37.6176,90.0,45.0,156.0
417
+ vehicle_001,55.7559,37.6177,92.0,47.0,157.0
418
+ vehicle_002,55.7600,37.6200,180.0,125.0,150.0
419
+ vehicle_002,55.7601,37.6201,182.0,15.0,151.0"""
420
+
421
+ # Convert CSV to list of lists
422
+ from io import StringIO
423
+ df = pd.read_csv(StringIO(csv_data))
424
+
425
+ # Convert DataFrame to list of lists
426
+ data_as_lists = df.values.tolist()
427
+
428
+ print(f"Loaded {len(data_as_lists)} rows from CSV")
429
+ print(f"Column order: {df.columns.tolist()}")
430
+ print(f"Sample data: {data_as_lists[0]}")
431
+
432
+ # Process with detector
433
+ detector = BatchAnomalyDetector("/kaggle/working/anomaly_analysis_pytorch_fixed/models")
434
+ results = detector.process_batch_list_of_lists(
435
+ data_as_lists,
436
+ column_order=df.columns.tolist()
437
+ )
438
+
439
+ print(f"Processing complete: {results['anomalies']} anomalies detected")
440
+
441
+ def large_dataset_example():
442
+ """Example for processing large datasets efficiently"""
443
+
444
+ print("\n🔢 Example: Large Dataset Processing")
445
+ print("=" * 50)
446
+
447
+ # Simulate large dataset
448
+ np.random.seed(42)
449
+ large_data = []
450
+
451
+ vehicles = [f"vehicle_{i:03d}" for i in range(1, 11)] # 10 vehicles
452
+
453
+ for vehicle in vehicles:
454
+ for point in range(100): # 100 points per vehicle
455
+ lat = 55.7500 + np.random.uniform(-0.01, 0.01)
456
+ lng = 37.6000 + np.random.uniform(-0.01, 0.01)
457
+ azm = np.random.uniform(0, 360)
458
+ spd = np.random.uniform(20, 80) if np.random.random() > 0.1 else np.random.uniform(90, 140) # 10% aggressive
459
+ alt = 150 + np.random.uniform(-20, 20)
460
+
461
+ large_data.append([vehicle, lat, lng, azm, spd, alt])
462
+
463
+ print(f"Generated large dataset: {len(large_data)} points from {len(vehicles)} vehicles")
464
+
465
+ # Process efficiently
466
+ detector = BatchAnomalyDetector("/kaggle/working/anomaly_analysis_pytorch_fixed/models")
467
+
468
+ # Process in chunks for memory efficiency
469
+ chunk_size = 500
470
+ total_anomalies = 0
471
+
472
+ for i in range(0, len(large_data), chunk_size):
473
+ chunk = large_data[i:i + chunk_size]
474
+ print(f"Processing chunk {i//chunk_size + 1}: points {i+1}-{i+len(chunk)}")
475
+
476
+ results = detector.process_batch_list_of_lists(chunk)
477
+ total_anomalies += results['anomalies']
478
+
479
+ print(f" Chunk anomalies: {results['anomalies']}")
480
+
481
+ print(f"\nLarge dataset complete:")
482
+ print(f" Total points: {len(large_data)}")
483
+ print(f" Total anomalies: {total_anomalies}")
484
+ print(f" Overall anomaly rate: {total_anomalies/len(large_data):.1%}")
gradio_app.py ADDED
@@ -0,0 +1,536 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ import numpy as np
4
+ import json
5
+ import time
6
+ from datetime import datetime
7
+ from typing import Dict, List, Tuple, Any
8
+ import plotly.express as px
9
+ import plotly.graph_objects as go
10
+ from plotly.subplots import make_subplots
11
+ import warnings
12
+ warnings.filterwarnings("ignore")
13
+
14
+ # Import your ML solution
15
+ from batch_production_pred import BatchAnomalyDetector
16
+ from production_predictor import AnomalyResult
17
+
18
+ class AnomalyDetectionGradioApp:
19
+ def __init__(self, model_dir: str = "./models"):
20
+ """Initialize the Gradio app with ML models"""
21
+ self.model_dir = model_dir
22
+ self.detector = None
23
+ self.load_models()
24
+
25
+ def load_models(self):
26
+ """Load the ML models"""
27
+ try:
28
+ self.detector = BatchAnomalyDetector(self.model_dir)
29
+ print("✅ Models loaded successfully!")
30
+ except Exception as e:
31
+ print(f"❌ Error loading models: {e}")
32
+ self.detector = None
33
+
34
+ def validate_csv(self, file_path: str) -> Tuple[bool, str, pd.DataFrame]:
35
+ """Validate uploaded CSV file"""
36
+ try:
37
+ # Read CSV
38
+ df = pd.read_csv(file_path)
39
+
40
+ # Check required columns
41
+ required_cols = ['randomized_id', 'lat', 'lng', 'spd', 'azm', 'alt']
42
+ missing_cols = [col for col in required_cols if col not in df.columns]
43
+
44
+ if missing_cols:
45
+ return False, f"❌ Missing required columns: {', '.join(missing_cols)}", None
46
+
47
+ # Check sample count
48
+ if len(df) > 2000:
49
+ return False, f"❌ Too many samples ({len(df)}). Maximum allowed: 2000", None
50
+
51
+ if len(df) < 5:
52
+ return False, f"❌ Too few samples ({len(df)}). Minimum required: 5", None
53
+
54
+ # Check data types and ranges
55
+ try:
56
+ df['lat'] = pd.to_numeric(df['lat'])
57
+ df['lng'] = pd.to_numeric(df['lng'])
58
+ df['spd'] = pd.to_numeric(df['spd'])
59
+ df['azm'] = pd.to_numeric(df['azm'])
60
+ df['alt'] = pd.to_numeric(df['alt'])
61
+
62
+ # Basic range validation
63
+ if not df['lat'].between(-90, 90).all():
64
+ return False, "❌ Latitude values must be between -90 and 90", None
65
+ if not df['lng'].between(-180, 180).all():
66
+ return False, "❌ Longitude values must be between -180 and 180", None
67
+ if not df['spd'].between(0, 300).all():
68
+ return False, "❌ Speed values must be between 0 and 300 km/h", None
69
+ if not df['azm'].between(0, 360).all():
70
+ return False, "❌ Azimuth values must be between 0 and 360 degrees", None
71
+
72
+ except Exception as e:
73
+ return False, f"❌ Data type error: {str(e)}", None
74
+
75
+ return True, f"✅ Valid CSV: {len(df)} samples, {df['randomized_id'].nunique()} vehicles", df
76
+
77
+ except Exception as e:
78
+ return False, f"❌ Error reading CSV: {str(e)}", None
79
+
80
+ def process_data(self, file_path: str, progress=gr.Progress()) -> Tuple[str, str, str, str]:
81
+ """Process the uploaded CSV and return results"""
82
+ if not self.detector:
83
+ return "❌ Models not loaded", "", "", ""
84
+
85
+ # Validate CSV
86
+ is_valid, message, df = self.validate_csv(file_path)
87
+ if not is_valid:
88
+ return message, "", "", ""
89
+
90
+ progress(0.1, desc="Validating data...")
91
+
92
+ try:
93
+ # Convert DataFrame to list of lists format
94
+ data_list = df[['randomized_id', 'lat', 'lng', 'azm', 'spd', 'alt']].values.tolist()
95
+ column_order = ['vehicle_id', 'lat', 'lng', 'azm', 'spd', 'alt']
96
+
97
+ progress(0.2, desc="Starting anomaly detection...")
98
+
99
+ # Process batch by vehicle
100
+ start_time = time.time()
101
+ vehicle_results = self.detector.process_batch_by_vehicle(
102
+ data_list,
103
+ column_order=column_order
104
+ )
105
+ processing_time = time.time() - start_time
106
+
107
+ progress(0.8, desc="Generating detailed results...")
108
+
109
+ # Generate detailed output
110
+ detailed_results = self.generate_detailed_results(vehicle_results, df)
111
+ summary_stats = self.generate_summary_stats(vehicle_results, processing_time)
112
+ visualization = self.create_visualization(vehicle_results, df)
113
+ json_output = self.generate_json_output(vehicle_results)
114
+
115
+ progress(1.0, desc="Complete!")
116
+
117
+ return detailed_results, summary_stats, visualization, json_output
118
+
119
+ except Exception as e:
120
+ return f"❌ Processing error: {str(e)}", "", "", ""
121
+
122
+ def generate_detailed_results(self, vehicle_results: Dict, original_df: pd.DataFrame) -> str:
123
+ """Generate detailed point-by-point analysis"""
124
+ output_lines = ["# 🔍 Detailed Anomaly Detection Results\n"]
125
+
126
+ total_points = 0
127
+ total_anomalies = 0
128
+
129
+ for vehicle_id, results in vehicle_results.items():
130
+ if not results:
131
+ continue
132
+
133
+ output_lines.append(f"## 🚗 Vehicle: {vehicle_id}")
134
+ output_lines.append(f"**Points analyzed:** {len(results)}\n")
135
+
136
+ vehicle_anomalies = 0
137
+
138
+ for i, result in enumerate(results, 1):
139
+ total_points += 1
140
+
141
+ if result.anomaly_detected:
142
+ total_anomalies += 1
143
+ vehicle_anomalies += 1
144
+
145
+ # Get original data point
146
+ vehicle_data = original_df[original_df['randomized_id'] == vehicle_id].iloc[i-1]
147
+
148
+ # Anomaly details
149
+ output_lines.append(f"### 🚨 Point {i}: **ANOMALY DETECTED!**")
150
+ output_lines.append(f"- **Alert Level:** {result.alert_level}")
151
+ output_lines.append(f"- **Confidence:** {result.confidence:.3f}")
152
+ output_lines.append(f"- **Location:** ({vehicle_data['lat']:.6f}, {vehicle_data['lng']:.6f})")
153
+ output_lines.append(f"- **Speed:** {result.driving_metrics.get('speed', 0):.1f} km/h")
154
+ output_lines.append(f"- **Altitude:** {vehicle_data['alt']:.1f} m")
155
+ output_lines.append(f"- **Heading:** {vehicle_data['azm']:.1f}°")
156
+
157
+ # Risk factors
158
+ risk_factors = [k for k, v in result.risk_factors.items() if v]
159
+ if risk_factors:
160
+ output_lines.append(f"- **Risk Factors:** {', '.join(risk_factors)}")
161
+
162
+ # Model scores
163
+ output_lines.append(f"- **Model Scores:**")
164
+ for model, score in result.raw_scores.items():
165
+ output_lines.append(f" - {model}: {score:.3f}")
166
+
167
+ output_lines.append("")
168
+ else:
169
+ # Normal point (abbreviated)
170
+ if i <= 5 or i % 10 == 0: # Show first 5 and every 10th normal point
171
+ output_lines.append(f"**Point {i}:** ✅ Normal (confidence: {result.confidence:.3f})")
172
+
173
+ # Vehicle summary
174
+ detection_rate = vehicle_anomalies / len(results) if results else 0
175
+ output_lines.append(f"\n**Vehicle Summary:** {vehicle_anomalies} anomalies out of {len(results)} points ({detection_rate:.1%})\n")
176
+ output_lines.append("---\n")
177
+
178
+ # Overall summary
179
+ overall_rate = total_anomalies / total_points if total_points > 0 else 0
180
+ output_lines.append(f"## 📊 Overall Summary")
181
+ output_lines.append(f"- **Total Points:** {total_points}")
182
+ output_lines.append(f"- **Total Anomalies:** {total_anomalies}")
183
+ output_lines.append(f"- **Detection Rate:** {overall_rate:.1%}")
184
+
185
+ return "\n".join(output_lines)
186
+
187
+ def generate_summary_stats(self, vehicle_results: Dict, processing_time: float) -> str:
188
+ """Generate summary statistics"""
189
+ total_vehicles = len(vehicle_results)
190
+ total_points = sum(len(results) for results in vehicle_results.values())
191
+ total_anomalies = sum(sum(1 for r in results if r.anomaly_detected)
192
+ for results in vehicle_results.values())
193
+
194
+ # Alert level distribution
195
+ alert_levels = {}
196
+ for results in vehicle_results.values():
197
+ for result in results:
198
+ if result.anomaly_detected:
199
+ level = result.alert_level
200
+ alert_levels[level] = alert_levels.get(level, 0) + 1
201
+
202
+ # Risk factor analysis
203
+ risk_factors = {}
204
+ for results in vehicle_results.values():
205
+ for result in results:
206
+ if result.anomaly_detected:
207
+ for factor, present in result.risk_factors.items():
208
+ if present:
209
+ risk_factors[factor] = risk_factors.get(factor, 0) + 1
210
+
211
+ output = f"""
212
+ # 📈 Processing Summary
213
+
214
+ ## ⚡ Performance Metrics
215
+ - **Processing Time:** {processing_time:.2f} seconds
216
+ - **Points per Second:** {total_points/processing_time:.1f}
217
+ - **Average Time per Point:** {1000*processing_time/total_points:.1f} ms
218
+
219
+ ## 📊 Detection Statistics
220
+ - **Total Vehicles:** {total_vehicles}
221
+ - **Total GPS Points:** {total_points}
222
+ - **Anomalies Detected:** {total_anomalies}
223
+ - **Overall Anomaly Rate:** {100*total_anomalies/total_points:.2f}%
224
+
225
+ ## 🚨 Alert Level Distribution
226
+ """
227
+
228
+ for level, count in sorted(alert_levels.items()):
229
+ percentage = 100 * count / total_anomalies if total_anomalies > 0 else 0
230
+ output += f"- **{level}:** {count} ({percentage:.1f}%)\n"
231
+
232
+ if risk_factors:
233
+ output += "\n## ⚠️ Top Risk Factors\n"
234
+ sorted_risks = sorted(risk_factors.items(), key=lambda x: x[1], reverse=True)[:5]
235
+ for factor, count in sorted_risks:
236
+ percentage = 100 * count / total_anomalies if total_anomalies > 0 else 0
237
+ output += f"- **{factor}:** {count} occurrences ({percentage:.1f}%)\n"
238
+
239
+ return output
240
+
241
+ def create_visualization(self, vehicle_results: Dict, original_df: pd.DataFrame) -> gr.Plot:
242
+ """Create interactive visualization"""
243
+ # Prepare data for plotting
244
+ plot_data = []
245
+
246
+ for vehicle_id, results in vehicle_results.items():
247
+ vehicle_df = original_df[original_df['randomized_id'] == vehicle_id].copy()
248
+
249
+ for i, result in enumerate(results):
250
+ if i < len(vehicle_df):
251
+ row = vehicle_df.iloc[i]
252
+ plot_data.append({
253
+ 'vehicle_id': vehicle_id,
254
+ 'lat': row['lat'],
255
+ 'lng': row['lng'],
256
+ 'spd': row['spd'],
257
+ 'alt': row['alt'],
258
+ 'azm': row['azm'],
259
+ 'anomaly': result.anomaly_detected,
260
+ 'confidence': result.confidence,
261
+ 'alert_level': result.alert_level if result.anomaly_detected else 'Normal'
262
+ })
263
+
264
+ plot_df = pd.DataFrame(plot_data)
265
+
266
+ if len(plot_df) == 0:
267
+ return gr.Plot(value=go.Figure().add_annotation(text="No data to plot"))
268
+
269
+ # Create subplots
270
+ fig = make_subplots(
271
+ rows=2, cols=2,
272
+ subplot_titles=('GPS Route with Anomalies', 'Speed Profile',
273
+ 'Altitude Profile', 'Confidence Distribution'),
274
+ specs=[[{"type": "scattermapbox"}, {"type": "scatter"}],
275
+ [{"type": "scatter"}, {"type": "histogram"}]]
276
+ )
277
+
278
+ # GPS Route Map
279
+ normal_points = plot_df[~plot_df['anomaly']]
280
+ anomaly_points = plot_df[plot_df['anomaly']]
281
+
282
+ if len(normal_points) > 0:
283
+ fig.add_trace(
284
+ go.Scattermapbox(
285
+ lat=normal_points['lat'],
286
+ lon=normal_points['lng'],
287
+ mode='markers',
288
+ marker=dict(size=8, color='green'),
289
+ text=normal_points['vehicle_id'],
290
+ name='Normal',
291
+ hovertemplate='<b>%{text}</b><br>Lat: %{lat}<br>Lon: %{lon}<extra></extra>'
292
+ ),
293
+ row=1, col=1
294
+ )
295
+
296
+ if len(anomaly_points) > 0:
297
+ fig.add_trace(
298
+ go.Scattermapbox(
299
+ lat=anomaly_points['lat'],
300
+ lon=anomaly_points['lng'],
301
+ mode='markers',
302
+ marker=dict(size=12, color='red', symbol='diamond'),
303
+ text=anomaly_points['alert_level'],
304
+ name='Anomaly',
305
+ hovertemplate='<b>%{text}</b><br>Lat: %{lat}<br>Lon: %{lon}<extra></extra>'
306
+ ),
307
+ row=1, col=1
308
+ )
309
+
310
+ # Speed Profile
311
+ fig.add_trace(
312
+ go.Scatter(
313
+ x=list(range(len(plot_df))),
314
+ y=plot_df['spd'],
315
+ mode='lines+markers',
316
+ marker=dict(color=plot_df['anomaly'].map({True: 'red', False: 'blue'})),
317
+ name='Speed',
318
+ hovertemplate='Point: %{x}<br>Speed: %{y} km/h<extra></extra>'
319
+ ),
320
+ row=1, col=2
321
+ )
322
+
323
+ # Altitude Profile
324
+ fig.add_trace(
325
+ go.Scatter(
326
+ x=list(range(len(plot_df))),
327
+ y=plot_df['alt'],
328
+ mode='lines+markers',
329
+ marker=dict(color=plot_df['anomaly'].map({True: 'red', False: 'green'})),
330
+ name='Altitude',
331
+ hovertemplate='Point: %{x}<br>Altitude: %{y} m<extra></extra>'
332
+ ),
333
+ row=2, col=1
334
+ )
335
+
336
+ # Confidence Distribution
337
+ fig.add_trace(
338
+ go.Histogram(
339
+ x=plot_df['confidence'],
340
+ nbinsx=20,
341
+ name='Confidence',
342
+ marker_color='lightblue'
343
+ ),
344
+ row=2, col=2
345
+ )
346
+
347
+ # Update layout
348
+ fig.update_layout(
349
+ mapbox=dict(
350
+ style="open-street-map",
351
+ center=dict(lat=plot_df['lat'].mean(), lon=plot_df['lng'].mean()),
352
+ zoom=10
353
+ ),
354
+ height=800,
355
+ showlegend=True,
356
+ title_text="��️ Vehicle Anomaly Detection Analysis"
357
+ )
358
+
359
+ fig.update_xaxes(title_text="Point Index", row=1, col=2)
360
+ fig.update_yaxes(title_text="Speed (km/h)", row=1, col=2)
361
+ fig.update_xaxes(title_text="Point Index", row=2, col=1)
362
+ fig.update_yaxes(title_text="Altitude (m)", row=2, col=1)
363
+ fig.update_xaxes(title_text="Confidence Score", row=2, col=2)
364
+ fig.update_yaxes(title_text="Count", row=2, col=2)
365
+
366
+ return gr.Plot(value=fig)
367
+
368
+ def generate_json_output(self, vehicle_results: Dict) -> str:
369
+ """Generate JSON output of all results"""
370
+ json_data = {
371
+ "detection_results": {},
372
+ "summary": {
373
+ "total_vehicles": len(vehicle_results),
374
+ "total_points": sum(len(results) for results in vehicle_results.values()),
375
+ "total_anomalies": sum(sum(1 for r in results if r.anomaly_detected)
376
+ for results in vehicle_results.values()),
377
+ "timestamp": datetime.now().isoformat()
378
+ }
379
+ }
380
+
381
+ for vehicle_id, results in vehicle_results.items():
382
+ json_data["detection_results"][vehicle_id] = []
383
+
384
+ for i, result in enumerate(results, 1):
385
+ result_dict = {
386
+ "point_number": i,
387
+ "anomaly_detected": result.anomaly_detected,
388
+ "confidence": round(result.confidence, 4),
389
+ "alert_level": result.alert_level,
390
+ "timestamp": result.timestamp,
391
+ "driving_metrics": result.driving_metrics,
392
+ "risk_factors": result.risk_factors,
393
+ "raw_scores": {k: round(v, 4) for k, v in result.raw_scores.items()}
394
+ }
395
+ json_data["detection_results"][vehicle_id].append(result_dict)
396
+
397
+ return json.dumps(json_data, indent=2)
398
+
399
+ # Initialize the app
400
+ app = AnomalyDetectionGradioApp()
401
+
402
+ def process_csv_file(file):
403
+ """Wrapper function for Gradio interface"""
404
+ if file is None:
405
+ return "Please upload a CSV file", "", "", ""
406
+
407
+ return app.process_data(file.name)
408
+
409
+ # Create the Gradio interface
410
+ def create_interface():
411
+ with gr.Blocks(
412
+ theme=gr.themes.Soft(),
413
+ title="🛣️ Vehicle Anomaly Detection System",
414
+ css="""
415
+ .gradio-container {
416
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
417
+ }
418
+ .main-header {
419
+ text-align: center;
420
+ background: linear-gradient(45deg, #1e3c72, #2a5298);
421
+ color: white;
422
+ padding: 2rem;
423
+ border-radius: 10px;
424
+ margin-bottom: 2rem;
425
+ }
426
+ .upload-area {
427
+ border: 2px dashed #4CAF50;
428
+ border-radius: 10px;
429
+ padding: 2rem;
430
+ text-align: center;
431
+ background-color: #f8f9fa;
432
+ }
433
+ """
434
+ ) as demo:
435
+
436
+ # Header
437
+ gr.HTML("""
438
+ <div class="main-header">
439
+ <h1>🛣️ Vehicle Anomaly Detection System</h1>
440
+ <p>Advanced ML-powered anomaly detection for GPS tracking data</p>
441
+ <p><strong>Upload your CSV with columns:</strong> randomized_id, lat, lng, spd, azm, alt (max 2000 samples)</p>
442
+ </div>
443
+ """)
444
+
445
+ with gr.Row():
446
+ with gr.Column(scale=1):
447
+ # File upload
448
+ gr.HTML('<div class="upload-area">')
449
+ file_upload = gr.File(
450
+ label="📁 Upload GPS Data CSV",
451
+ file_types=[".csv"],
452
+ type="filepath"
453
+ )
454
+ gr.HTML('</div>')
455
+
456
+ # Process button
457
+ process_btn = gr.Button(
458
+ "🚀 Analyze Anomalies",
459
+ variant="primary",
460
+ size="lg"
461
+ )
462
+
463
+ # Sample data info
464
+ gr.HTML("""
465
+ <div style="margin-top: 1rem; padding: 1rem; background-color: #e8f4fd; border-radius: 5px;">
466
+ <h4>📋 Expected CSV Format:</h4>
467
+ <code>
468
+ randomized_id,lat,lng,spd,azm,alt<br>
469
+ VEHICLE001,40.7128,-74.0060,45.5,90.0,100.0<br>
470
+ VEHICLE001,40.7138,-74.0070,48.2,92.0,102.0<br>
471
+ ...
472
+ </code>
473
+ <ul style="margin-top: 1rem;">
474
+ <li><strong>randomized_id:</strong> Vehicle identifier</li>
475
+ <li><strong>lat:</strong> Latitude (-90 to 90)</li>
476
+ <li><strong>lng:</strong> Longitude (-180 to 180)</li>
477
+ <li><strong>spd:</strong> Speed in km/h (0-300)</li>
478
+ <li><strong>azm:</strong> Azimuth/heading (0-360°)</li>
479
+ <li><strong>alt:</strong> Altitude in meters</li>
480
+ </ul>
481
+ </div>
482
+ """)
483
+
484
+ # Results tabs
485
+ with gr.Tabs():
486
+ with gr.Tab("📋 Detailed Results"):
487
+ detailed_output = gr.Markdown(
488
+ value="Upload a CSV file and click 'Analyze Anomalies' to see detailed results...",
489
+ elem_classes=["detailed-results"]
490
+ )
491
+
492
+ with gr.Tab("📊 Summary & Stats"):
493
+ summary_output = gr.Markdown(
494
+ value="Processing summary will appear here...",
495
+ elem_classes=["summary-stats"]
496
+ )
497
+
498
+ with gr.Tab("📈 Visualizations"):
499
+ viz_output = gr.Plot(
500
+ label="Interactive Analysis Charts"
501
+ )
502
+
503
+ with gr.Tab("💾 JSON Export"):
504
+ json_output = gr.Code(
505
+ language="json",
506
+ label="Complete Results JSON",
507
+ value="JSON results will appear here..."
508
+ )
509
+
510
+ # Connect the processing
511
+ process_btn.click(
512
+ fn=process_csv_file,
513
+ inputs=[file_upload],
514
+ outputs=[detailed_output, summary_output, viz_output, json_output],
515
+ show_progress=True
516
+ )
517
+
518
+ # Footer
519
+ gr.HTML("""
520
+ <div style="text-align: center; margin-top: 2rem; padding: 1rem; background-color: #f1f3f4; border-radius: 5px;">
521
+ <p>🔬 <strong>ML Models:</strong> Isolation Forest + One-Class SVM + LSTM Autoencoder</p>
522
+ <p>⚡ <strong>Processing:</strong> ~45-90 seconds for 2000 samples on CPU</p>
523
+ <p>🛡️ <strong>Privacy:</strong> All processing happens locally - your data never leaves this environment</p>
524
+ </div>
525
+ """)
526
+
527
+ return demo
528
+
529
+ if __name__ == "__main__":
530
+ demo = create_interface()
531
+ demo.launch(
532
+ server_name="0.0.0.0",
533
+ server_port=7860,
534
+ share=True,
535
+ show_error=True
536
+ )
launch.bat ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @echo off
2
+ echo 🛣️ Vehicle Anomaly Detection System
3
+ echo =====================================
4
+ echo.
5
+ echo Installing dependencies...
6
+ pip install -r requirements.txt
7
+ echo.
8
+ echo Starting Gradio application...
9
+ echo Access the interface at: http://localhost:7860
10
+ echo.
11
+ python gradio_app.py
12
+ pause
models/feature_names.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"feature_names": ["spd", "acceleration", "jerk", "angular_velocity", "lateral_acceleration", "heading_change_rate", "curvature", "overall_risk", "speed_std_3", "speed_std_5", "speed_std_10", "accel_std_3", "accel_std_5", "accel_std_10", "acceleration_risk", "jerk_risk", "lateral_risk", "speed_risk"]}
models/isolation_forest.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b4ee70e6ebf4a9e9d9ecd6b2ce0897303f12513078ca4870030d554ab155fdd
3
+ size 1710078
models/lstm_autoencoder.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69846219391308a4980bf5475668bb3e24e387e666522134a5de13f49f493398
3
+ size 500332
models/lstm_threshold.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"lstm_threshold": 2.9153685569763184}
models/manifest.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "short_name": "React App",
3
+ "name": "Create React App Sample",
4
+ "icons": [
5
+ {
6
+ "src": "favicon.ico",
7
+ "sizes": "64x64 32x32 24x24 16x16",
8
+ "type": "image/x-icon"
9
+ },
10
+ {
11
+ "src": "logo192.png",
12
+ "type": "image/png",
13
+ "sizes": "192x192"
14
+ },
15
+ {
16
+ "src": "logo512.png",
17
+ "type": "image/png",
18
+ "sizes": "512x512"
19
+ }
20
+ ],
21
+ "start_url": ".",
22
+ "display": "standalone",
23
+ "theme_color": "#000000",
24
+ "background_color": "#ffffff"
25
+ }
models/model_metadata.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "models_saved": [
3
+ "Isolation Forest",
4
+ "One-Class SVM",
5
+ "LSTM Autoencoder",
6
+ "LSTM Threshold",
7
+ "Scaler",
8
+ "Feature Names"
9
+ ],
10
+ "save_timestamp": "2025-09-13T15:15:49.010561",
11
+ "device_used": "cuda",
12
+ "total_samples": 118166
13
+ }
models/one_class_svm.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89b7add1680af2a043dc510902dfb31c64cb74ece7bd4d08175edec9cb117161
3
+ size 412575
models/optimization_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab47250d98f86d183ed95f5b6aa8d4017597d0d510be8d4fb43abd623d4ae75c
3
+ size 409969
models/robots.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # https://www.robotstxt.org/robotstxt.html
2
+ User-agent: *
3
+ Disallow:
models/scaler.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a84f6f969094f40ff8b1c5b6cfb81dc63b3689d8f0902aed115b57f96e33f47
3
+ size 1319
production_predictor.py ADDED
@@ -0,0 +1,673 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import time
4
+ import logging
5
+ import numpy as np
6
+ import pandas as pd
7
+ import torch
8
+ import joblib
9
+ from datetime import datetime
10
+ from collections import deque
11
+ from typing import Dict, List, Optional, Any
12
+ import asyncio
13
+ import aiofiles
14
+ from dataclasses import dataclass, asdict
15
+ from pathlib import Path
16
+ from scipy.signal import savgol_filter
17
+
18
+ # Set up logging
19
+ logging.basicConfig(level=logging.INFO)
20
+ logger = logging.getLogger(__name__)
21
+
22
+ # Set device
23
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
24
+
25
+ @dataclass
26
+ class GPSPoint:
27
+ """GPS data point from tracker - matches your dataset structure"""
28
+ vehicle_id: str # This will be our randomized_id
29
+ lat: float
30
+ lng: float
31
+ alt: float
32
+ spd: float # speed in km/h
33
+ azm: float # azimuth/heading 0-360
34
+ timestamp: str = None # Added for real-time tracking
35
+
36
+ @classmethod
37
+ def from_tracker_data(cls, tracker_data: Dict) -> 'GPSPoint':
38
+ """Convert from real GPS tracker format to our dataset format"""
39
+ return cls(
40
+ vehicle_id=tracker_data.get('vehicle_id', tracker_data.get('device_id')),
41
+ lat=tracker_data['lat'],
42
+ lng=tracker_data['lng'],
43
+ alt=tracker_data.get('alt', tracker_data.get('altitude', 0.0)),
44
+ spd=tracker_data.get('spd', tracker_data.get('speed', 0.0)),
45
+ azm=tracker_data.get('azm', tracker_data.get('heading', 0.0)),
46
+ timestamp=tracker_data.get('timestamp', datetime.now().isoformat())
47
+ )
48
+
49
+ def to_dataset_format(self) -> Dict:
50
+ """Convert to the format expected by your trained model"""
51
+ return {
52
+ 'randomized_id': self.vehicle_id,
53
+ 'lat': self.lat,
54
+ 'lng': self.lng,
55
+ 'alt': self.alt,
56
+ 'spd': self.spd,
57
+ 'azm': self.azm
58
+ }
59
+
60
+ @dataclass
61
+ class AnomalyResult:
62
+ """Anomaly detection result"""
63
+ timestamp: str
64
+ vehicle_id: str
65
+ anomaly_detected: bool
66
+ confidence: float
67
+ alert_level: str
68
+ raw_scores: Dict[str, float]
69
+ driving_metrics: Dict[str, float]
70
+ risk_factors: Dict[str, bool]
71
+
72
+ def to_dict(self) -> Dict:
73
+ return asdict(self)
74
+
75
+ # Import the LSTM model from your training code
76
+ class LSTMAutoencoder(torch.nn.Module):
77
+ """LSTM Autoencoder - same as your training code"""
78
+
79
+ def __init__(self, input_dim, hidden_dim=64, latent_dim=10, num_layers=2, sequence_length=20):
80
+ super(LSTMAutoencoder, self).__init__()
81
+ self.input_dim = input_dim
82
+ self.hidden_dim = hidden_dim
83
+ self.latent_dim = latent_dim
84
+ self.num_layers = num_layers
85
+ self.sequence_length = sequence_length
86
+
87
+ # Encoder
88
+ self.encoder_lstm = torch.nn.LSTM(
89
+ input_dim, hidden_dim, num_layers,
90
+ batch_first=True, dropout=0.2 if num_layers > 1 else 0
91
+ )
92
+ self.encoder_fc = torch.nn.Linear(hidden_dim, latent_dim)
93
+
94
+ # Decoder
95
+ self.decoder_fc = torch.nn.Linear(latent_dim, hidden_dim)
96
+ self.decoder_lstm = torch.nn.LSTM(
97
+ hidden_dim, hidden_dim, num_layers,
98
+ batch_first=True, dropout=0.2 if num_layers > 1 else 0
99
+ )
100
+ self.output_projection = torch.nn.Linear(hidden_dim, input_dim)
101
+
102
+ self.dropout = torch.nn.Dropout(0.2)
103
+
104
+ def encode(self, x):
105
+ lstm_out, (hidden, cell) = self.encoder_lstm(x)
106
+ encoded = self.encoder_fc(hidden[-1])
107
+ return encoded
108
+
109
+ def decode(self, encoded):
110
+ batch_size = encoded.size(0)
111
+ decoded = self.decoder_fc(encoded)
112
+ decoded = decoded.unsqueeze(1).repeat(1, self.sequence_length, 1)
113
+ lstm_out, _ = self.decoder_lstm(decoded)
114
+ output = self.output_projection(lstm_out)
115
+ return output
116
+
117
+ def forward(self, x):
118
+ encoded = self.encode(x)
119
+ decoded = self.decode(encoded)
120
+ return decoded
121
+
122
+ class ProductionAnomalyDetector:
123
+ """
124
+ Production-ready driving anomaly detection system
125
+ Works with your exact dataset format: randomized_id,lat,lng,alt,spd,azm
126
+ """
127
+
128
+ def __init__(self, model_dir: str, config: Dict = None):
129
+ """
130
+ Initialize with pre-trained models
131
+ """
132
+ self.model_dir = Path(model_dir)
133
+ self.config = config or self._default_config()
134
+
135
+ # Model components
136
+ self.scaler = None
137
+ self.isolation_forest = None
138
+ self.one_class_svm = None
139
+ self.lstm_autoencoder = None
140
+ self.lstm_threshold = None
141
+
142
+ # Vehicle buffers for real-time processing
143
+ self.vehicle_buffers = {} # vehicle_id -> deque of GPS points
144
+ self.buffer_size = self.config['buffer_size']
145
+
146
+ # Normalization parameters
147
+ self.if_min = None
148
+ self.if_max = None
149
+ self.svm_min = None
150
+ self.svm_max = None
151
+
152
+ # Load models
153
+ self._load_models()
154
+
155
+ logger.info(f"ProductionAnomalyDetector initialized with models from {model_dir}")
156
+ logger.info(f"Using device: {device}")
157
+
158
+ def _default_config(self) -> Dict:
159
+ """Default configuration matching your training setup"""
160
+ return {
161
+ 'buffer_size': 20,
162
+ 'min_points_for_detection': 5,
163
+ 'lstm_sequence_length': 15,
164
+ 'alert_threshold': 0.3,
165
+ 'weights': {
166
+ 'isolation_forest': 0.35,
167
+ 'one_class_svm': 0.30,
168
+ 'lstm': 0.35
169
+ }
170
+ }
171
+
172
+ def _load_models(self):
173
+ """Load all pre-trained models"""
174
+ try:
175
+ # Load scaler (required)
176
+ scaler_path = self.model_dir / 'scaler.pkl'
177
+ if scaler_path.exists():
178
+ self.scaler = joblib.load(scaler_path)
179
+ logger.info("✓ Feature scaler loaded")
180
+ else:
181
+ raise FileNotFoundError(f"Feature scaler not found: {scaler_path}")
182
+
183
+ # Load Isolation Forest
184
+ if_path = self.model_dir / 'isolation_forest.pkl'
185
+ if if_path.exists():
186
+ self.isolation_forest = joblib.load(if_path)
187
+ logger.info("✓ Isolation Forest loaded")
188
+
189
+ # Load One-Class SVM
190
+ svm_path = self.model_dir / 'one_class_svm.pkl'
191
+ if svm_path.exists():
192
+ self.one_class_svm = joblib.load(svm_path)
193
+ logger.info("✓ One-Class SVM loaded")
194
+
195
+ # Load LSTM Autoencoder
196
+ lstm_path = self.model_dir / 'lstm_autoencoder.pth'
197
+ if lstm_path.exists():
198
+ checkpoint = torch.load(lstm_path, map_location=device)
199
+ lstm_config = checkpoint["model_config"]
200
+ self.lstm_autoencoder = LSTMAutoencoder(**lstm_config).to(device)
201
+
202
+ self.lstm_autoencoder.load_state_dict(checkpoint["model_state_dict"])
203
+ self.lstm_autoencoder.eval()
204
+ logger.info("✓ LSTM Autoencoder loaded")
205
+ self.lstm_threshold = 2.9153685569763184 # fallback threshold
206
+ logger.info(f"✓ LSTM threshold: {self.lstm_threshold}")
207
+
208
+ # Load normalization parameters
209
+ norm_path = self.model_dir / 'normalization_params.json'
210
+ if norm_path.exists():
211
+ with open(norm_path, 'r') as f:
212
+ norm_params = json.load(f)
213
+ self.if_min = norm_params.get('if_min', -0.2400)
214
+ self.if_max = norm_params.get('if_max', 0.1680)
215
+ self.svm_min = norm_params.get('svm_min', -381.6356)
216
+ self.svm_max = norm_params.get('svm_max', 106.7346)
217
+ logger.info("✓ Normalization parameters loaded")
218
+ else:
219
+ # Use your actual training values
220
+ self.if_min, self.if_max = -0.2400, 0.1680
221
+ self.svm_min, self.svm_max = -381.6356, 106.7346
222
+ logger.info("Using training normalization parameters")
223
+
224
+ logger.info("All models loaded successfully!")
225
+
226
+ except Exception as e:
227
+ logger.error(f"Error loading models: {e}")
228
+ raise
229
+
230
+ def process_gps_point(self, gps_point: GPSPoint) -> Optional[AnomalyResult]:
231
+ """
232
+ Process a single GPS point - main entry point for real-time detection
233
+ """
234
+ vehicle_id = gps_point.vehicle_id
235
+
236
+ # Initialize vehicle buffer if needed
237
+ if vehicle_id not in self.vehicle_buffers:
238
+ self.vehicle_buffers[vehicle_id] = deque(maxlen=self.buffer_size)
239
+
240
+ # Add point to buffer
241
+ self.vehicle_buffers[vehicle_id].append(gps_point)
242
+ buffer = self.vehicle_buffers[vehicle_id]
243
+
244
+ # Need minimum points for detection
245
+ if len(buffer) < self.config['min_points_for_detection']:
246
+ return None
247
+
248
+ try:
249
+ # Convert buffer to DataFrame in your exact format
250
+ buffer_data = []
251
+ for point in buffer:
252
+ buffer_data.append(point.to_dataset_format())
253
+
254
+ df_buffer = pd.DataFrame(buffer_data)
255
+
256
+ # Calculate features using your exact feature engineering pipeline
257
+ features_df = self._calculate_features_exact_pipeline(df_buffer)
258
+
259
+ if len(features_df) == 0:
260
+ return None
261
+
262
+ # Get latest point features
263
+ latest_features = features_df.iloc[-1:].values
264
+ latest_scaled = self.scaler.transform(latest_features)
265
+
266
+ # Get anomaly scores
267
+ scores = self._get_anomaly_scores(features_df, latest_scaled)
268
+
269
+ # Calculate ensemble score
270
+ ensemble_score = self._calculate_ensemble_score(scores)
271
+
272
+ # Determine alert level
273
+ alert_level = self._get_alert_level(ensemble_score)
274
+
275
+ # Extract metrics from the processed features
276
+ latest_processed = features_df.iloc[-1]
277
+ driving_metrics = self._extract_driving_metrics_from_features(latest_processed)
278
+ risk_factors = self._extract_risk_factors_from_features(latest_processed)
279
+
280
+ return AnomalyResult(
281
+ timestamp=gps_point.timestamp or datetime.now().isoformat(),
282
+ vehicle_id=vehicle_id,
283
+ anomaly_detected=ensemble_score > self.config['alert_threshold'],
284
+ confidence=float(ensemble_score),
285
+ alert_level=alert_level,
286
+ raw_scores=scores,
287
+ driving_metrics=driving_metrics,
288
+ risk_factors=risk_factors
289
+ )
290
+
291
+ except Exception as e:
292
+ logger.error(f"Error processing GPS point for vehicle {vehicle_id}: {e}")
293
+ return None
294
+
295
+ def _calculate_features_exact_pipeline(self, df: pd.DataFrame) -> pd.DataFrame:
296
+ """
297
+ Calculate features using EXACT same pipeline as your training code
298
+ Input: DataFrame with columns [randomized_id, lat, lng, alt, spd, azm]
299
+ Output: DataFrame with 18 features ready for ML models
300
+ """
301
+ # Apply the EXACT same feature engineering as your training
302
+ df_processed = self._apply_physics_calculations(df.copy())
303
+ df_processed = self._apply_anomaly_feature_engineering(df_processed)
304
+ features_df = self._prepare_ml_features_exact(df_processed)
305
+
306
+ return features_df
307
+
308
+ def _apply_physics_calculations(self, df: pd.DataFrame) -> pd.DataFrame:
309
+ """Apply exact physics calculations from your training code"""
310
+
311
+ # Sort by trip and create sequence
312
+ df = df.sort_values(['randomized_id', 'lat', 'lng'])
313
+ df['sequence'] = df.groupby('randomized_id').cumcount()
314
+ df['time_delta'] = 1.0 # 1 second intervals
315
+
316
+ def calculate_trip_features(group):
317
+ if len(group) < 3:
318
+ # Fill with safe defaults for short trips
319
+ group['distance'] = 0.0
320
+ group['speed_smooth'] = group['spd']
321
+ group['acceleration'] = 0.0
322
+ group['jerk'] = 0.0
323
+ group['angular_velocity'] = 0.0
324
+ group['lateral_acceleration'] = 0.0
325
+ group['heading_change_rate'] = 0.0
326
+ group['curvature'] = 0.0
327
+ return group
328
+
329
+ # Haversine distance calculation
330
+ def haversine_distance(lat1, lon1, lat2, lon2):
331
+ R = 6371000 # Earth radius in meters
332
+ lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
333
+ dlat = lat2 - lat1
334
+ dlon = lon2 - lon1
335
+ a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
336
+ c = 2 * np.arcsin(np.sqrt(np.clip(a, 0, 1)))
337
+ return R * c
338
+
339
+ # Calculate distances
340
+ distances = [0]
341
+ for i in range(1, len(group)):
342
+ try:
343
+ dist = haversine_distance(
344
+ group.iloc[i-1]['lat'], group.iloc[i-1]['lng'],
345
+ group.iloc[i]['lat'], group.iloc[i]['lng']
346
+ )
347
+ dist = min(dist, 1000) # Cap at 1km to avoid GPS errors
348
+ distances.append(dist)
349
+ except:
350
+ distances.append(0)
351
+
352
+ group['distance'] = distances
353
+
354
+ # Smooth speed data
355
+ if len(group) >= 5:
356
+ try:
357
+ group['speed_smooth'] = savgol_filter(group['spd'], 5, 2)
358
+ except:
359
+ group['speed_smooth'] = group['spd']
360
+ else:
361
+ group['speed_smooth'] = group['spd']
362
+
363
+ group['speed_smooth'] = np.maximum(group['speed_smooth'], 0)
364
+
365
+ # Calculate acceleration
366
+ speed_ms = group['speed_smooth'] / 3.6 # km/h to m/s
367
+ try:
368
+ acceleration = np.gradient(speed_ms, group['time_delta'])
369
+ acceleration = np.clip(acceleration, -15, 15)
370
+ except:
371
+ acceleration = np.zeros(len(group))
372
+ group['acceleration'] = acceleration
373
+
374
+ # Calculate jerk
375
+ try:
376
+ jerk = np.gradient(acceleration, group['time_delta'])
377
+ jerk = np.clip(jerk, -20, 20)
378
+ except:
379
+ jerk = np.zeros(len(group))
380
+ group['jerk'] = jerk
381
+
382
+ # Calculate angular velocity
383
+ try:
384
+ azimuth_rad = np.radians(group['azm'])
385
+ azimuth_unwrapped = np.unwrap(azimuth_rad)
386
+ angular_velocity = np.gradient(azimuth_unwrapped, group['time_delta'])
387
+ angular_velocity = np.clip(angular_velocity, -np.pi, np.pi)
388
+ except:
389
+ angular_velocity = np.zeros(len(group))
390
+ group['angular_velocity'] = angular_velocity
391
+
392
+ # Calculate lateral acceleration
393
+ lateral_acceleration = speed_ms * angular_velocity
394
+ lateral_acceleration = np.clip(lateral_acceleration, -20, 20)
395
+ group['lateral_acceleration'] = lateral_acceleration
396
+
397
+ # Calculate heading change rate
398
+ group['heading_change_rate'] = np.abs(angular_velocity)
399
+
400
+ # Calculate curvature with safe division
401
+ denominator = speed_ms + 0.1
402
+ group['curvature'] = np.divide(
403
+ np.abs(angular_velocity),
404
+ denominator,
405
+ out=np.zeros_like(angular_velocity),
406
+ where=denominator!=0
407
+ )
408
+
409
+ return group
410
+
411
+ df = df.groupby('randomized_id').apply(calculate_trip_features)
412
+ df = df.reset_index(drop=True)
413
+
414
+ # Clean any remaining NaN/inf values
415
+ numeric_columns = ['distance', 'speed_smooth', 'acceleration', 'jerk',
416
+ 'angular_velocity', 'lateral_acceleration', 'heading_change_rate', 'curvature']
417
+
418
+ for col in numeric_columns:
419
+ if col in df.columns:
420
+ df[col] = df[col].fillna(0)
421
+ df[col] = df[col].replace([np.inf, -np.inf], 0)
422
+
423
+ return df
424
+
425
+ def _apply_anomaly_feature_engineering(self, df: pd.DataFrame) -> pd.DataFrame:
426
+ """Apply exact anomaly feature engineering from your training code"""
427
+
428
+ # Rolling window statistics
429
+ window_sizes = [3, 5, 10]
430
+
431
+ for window in window_sizes:
432
+ try:
433
+ # Speed patterns
434
+ df[f'speed_std_{window}'] = df.groupby('randomized_id')['spd'].rolling(
435
+ window, center=True, min_periods=1).std().reset_index(0, drop=True).fillna(0)
436
+ df[f'speed_max_{window}'] = df.groupby('randomized_id')['spd'].rolling(
437
+ window, center=True, min_periods=1).max().reset_index(0, drop=True).fillna(0)
438
+ df[f'speed_min_{window}'] = df.groupby('randomized_id')['spd'].rolling(
439
+ window, center=True, min_periods=1).min().reset_index(0, drop=True).fillna(0)
440
+
441
+ # Acceleration patterns
442
+ df[f'accel_std_{window}'] = df.groupby('randomized_id')['acceleration'].rolling(
443
+ window, center=True, min_periods=1).std().reset_index(0, drop=True).fillna(0)
444
+ df[f'accel_max_{window}'] = df.groupby('randomized_id')['acceleration'].rolling(
445
+ window, center=True, min_periods=1).max().reset_index(0, drop=True).fillna(0)
446
+ df[f'accel_min_{window}'] = df.groupby('randomized_id')['acceleration'].rolling(
447
+ window, center=True, min_periods=1).min().reset_index(0, drop=True).fillna(0)
448
+ except:
449
+ # Fallback values
450
+ df[f'speed_std_{window}'] = 0
451
+ df[f'speed_max_{window}'] = df['spd']
452
+ df[f'speed_min_{window}'] = df['spd']
453
+ df[f'accel_std_{window}'] = 0
454
+ df[f'accel_max_{window}'] = df['acceleration']
455
+ df[f'accel_min_{window}'] = df['acceleration']
456
+
457
+ # Extreme behavior indicators (exact thresholds from training)
458
+ df['hard_braking'] = (df['acceleration'] < -4.0).astype(int)
459
+ df['hard_acceleration'] = (df['acceleration'] > 3.0).astype(int)
460
+ df['excessive_speed'] = (df['spd'] > 80).astype(int)
461
+ df['sharp_turn'] = (np.abs(df['lateral_acceleration']) > 4.0).astype(int)
462
+ df['erratic_steering'] = (np.abs(df['heading_change_rate']) > 0.5).astype(int)
463
+
464
+ # Composite risk scores (exact same calculations)
465
+ df['acceleration_risk'] = np.clip(np.abs(df['acceleration']) / 10.0, 0, 1)
466
+ df['jerk_risk'] = np.clip(np.abs(df['jerk']) / 5.0, 0, 1)
467
+ df['lateral_risk'] = np.clip(np.abs(df['lateral_acceleration']) / 8.0, 0, 1)
468
+ df['speed_risk'] = np.clip(np.maximum(0, (df['spd'] - 60) / 40.0), 0, 1)
469
+
470
+ # Overall risk score (exact same weights)
471
+ df['overall_risk'] = (
472
+ df['acceleration_risk'] * 0.25 +
473
+ df['jerk_risk'] * 0.20 +
474
+ df['lateral_risk'] * 0.25 +
475
+ df['speed_risk'] * 0.15 +
476
+ (df['hard_braking'] + df['hard_acceleration'] +
477
+ df['sharp_turn'] + df['erratic_steering']) * 0.15 / 4
478
+ )
479
+
480
+ df['overall_risk'] = np.clip(df['overall_risk'], 0, 1)
481
+
482
+ return df
483
+
484
+ def _prepare_ml_features_exact(self, df: pd.DataFrame) -> pd.DataFrame:
485
+ """Prepare exact same 18 features as in training"""
486
+
487
+ # Exact same feature columns as your training
488
+ feature_columns = [
489
+ 'spd', 'acceleration', 'jerk', 'angular_velocity', 'lateral_acceleration',
490
+ 'heading_change_rate', 'curvature', 'overall_risk',
491
+ 'speed_std_3', 'speed_std_5', 'speed_std_10',
492
+ 'accel_std_3', 'accel_std_5', 'accel_std_10',
493
+ 'acceleration_risk', 'jerk_risk', 'lateral_risk', 'speed_risk'
494
+ ]
495
+
496
+ features_df = df[feature_columns].copy()
497
+
498
+ # Clean any remaining issues
499
+ for col in feature_columns:
500
+ features_df[col] = features_df[col].fillna(0)
501
+ features_df[col] = features_df[col].replace([np.inf, -np.inf], 0)
502
+
503
+ return features_df
504
+
505
+ def _get_anomaly_scores(self, features_df: pd.DataFrame, latest_scaled: np.ndarray) -> Dict[str, float]:
506
+ """Get anomaly scores from all models"""
507
+ scores = {}
508
+
509
+ # Isolation Forest
510
+ if self.isolation_forest:
511
+ scores['isolation_forest'] = float(self.isolation_forest.decision_function(latest_scaled)[0])
512
+
513
+ # One-Class SVM
514
+ if self.one_class_svm:
515
+ scores['one_class_svm'] = float(self.one_class_svm.decision_function(latest_scaled)[0])
516
+
517
+ # LSTM Autoencoder
518
+ if self.lstm_autoencoder and len(features_df) >= self.config['lstm_sequence_length']:
519
+ try:
520
+ sequence_length = self.config['lstm_sequence_length']
521
+ sequence_features = features_df.iloc[-sequence_length:].values
522
+ sequence_scaled = self.scaler.transform(sequence_features)
523
+ sequence_tensor = torch.FloatTensor(sequence_scaled).unsqueeze(0).to(device)
524
+
525
+ with torch.no_grad():
526
+ reconstructed = self.lstm_autoencoder(sequence_tensor)
527
+ reconstruction_error = torch.mean((sequence_tensor - reconstructed) ** 2).item()
528
+ scores['lstm'] = float(reconstruction_error)
529
+ except Exception as e:
530
+ logger.warning(f"LSTM inference error: {e}")
531
+ scores['lstm'] = 0.0
532
+
533
+ return scores
534
+
535
+ def _calculate_ensemble_score(self, scores: Dict[str, float]) -> float:
536
+ """Calculate ensemble score using exact same logic as training"""
537
+ ensemble_score = 0.0
538
+ weights = self.config['weights']
539
+
540
+ # Isolation Forest (lower = more anomalous)
541
+ if 'isolation_forest' in scores:
542
+ if_range = self.if_max - self.if_min
543
+ if if_range > 0:
544
+ if_normalized = (scores['isolation_forest'] - self.if_min) / if_range
545
+ if_anomaly_score = 1.0 - np.clip(if_normalized, 0, 1)
546
+ else:
547
+ if_anomaly_score = 0.5
548
+ ensemble_score += weights['isolation_forest'] * if_anomaly_score
549
+
550
+ # SVM (negative = more anomalous)
551
+ if 'one_class_svm' in scores:
552
+ svm_range = self.svm_max - self.svm_min
553
+ if svm_range > 0:
554
+ svm_normalized = (scores['one_class_svm'] - self.svm_min) / svm_range
555
+ svm_anomaly_score = 1.0 - np.clip(svm_normalized, 0, 1)
556
+ else:
557
+ svm_anomaly_score = 0.5
558
+ ensemble_score += weights['one_class_svm'] * svm_anomaly_score
559
+
560
+ # LSTM (higher reconstruction error = more anomalous)
561
+ if 'lstm' in scores and self.lstm_threshold:
562
+ lstm_anomaly_score = np.clip(scores['lstm'] / self.lstm_threshold, 0, 1)
563
+ ensemble_score += weights['lstm'] * lstm_anomaly_score
564
+
565
+ return np.clip(ensemble_score, 0, 1)
566
+
567
+ def _get_alert_level(self, confidence: float) -> str:
568
+ """Determine alert level"""
569
+ if confidence > 0.8:
570
+ return 'CRITICAL'
571
+ elif confidence > 0.6:
572
+ return 'HIGH'
573
+ elif confidence > 0.4:
574
+ return 'MEDIUM'
575
+ elif confidence > 0.2:
576
+ return 'LOW'
577
+ else:
578
+ return 'NORMAL'
579
+
580
+ def _extract_driving_metrics_from_features(self, features_row: pd.Series) -> Dict[str, float]:
581
+ """Extract driving metrics from processed features"""
582
+ return {
583
+ 'speed': float(features_row['spd']),
584
+ 'acceleration': float(features_row['acceleration']),
585
+ 'lateral_acceleration': float(features_row['lateral_acceleration']),
586
+ 'jerk': float(features_row['jerk']),
587
+ 'heading_change_rate': float(features_row['heading_change_rate']),
588
+ 'overall_risk': float(features_row['overall_risk'])
589
+ }
590
+
591
+ def _extract_risk_factors_from_features(self, features_row):
592
+ """
593
+ Extract boolean risk factors from a row of driving features.
594
+ """
595
+
596
+ return {
597
+ 'hard_braking': bool(features_row['acceleration'] < -2.5), # sudden deceleration
598
+ 'hard_acceleration': bool(features_row['acceleration'] > 2.5), # sudden acceleration
599
+ 'excessive_speed': bool(features_row['spd'] > 120), # overspeeding (km/h)
600
+ 'sharp_turn': bool(abs(features_row['lateral_acceleration']) > 3.0), # strong lateral g-force
601
+ 'erratic_steering': bool(abs(features_row['angular_velocity']) > 30) # quick steering angle change
602
+ }
603
+
604
+ def get_vehicle_status(self, vehicle_id: str) -> Dict[str, Any]:
605
+ """Get current status of a vehicle"""
606
+ if vehicle_id not in self.vehicle_buffers:
607
+ return {'vehicle_id': vehicle_id, 'status': 'no_data'}
608
+
609
+ buffer = self.vehicle_buffers[vehicle_id]
610
+ return {
611
+ 'vehicle_id': vehicle_id,
612
+ 'buffer_size': len(buffer),
613
+ 'last_update': buffer[-1].timestamp if buffer else None,
614
+ 'ready_for_detection': len(buffer) >= self.config['min_points_for_detection']
615
+ }
616
+
617
+ # Updated API input model to match your data structure
618
+ from fastapi import FastAPI, HTTPException
619
+ from pydantic import BaseModel
620
+ from typing import Optional
621
+
622
+ class GPSPointRequest(BaseModel):
623
+ """API request model matching your dataset columns"""
624
+ vehicle_id: str # maps to randomized_id
625
+ lat: float
626
+ lng: float
627
+ alt: float = 0.0
628
+ spd: float # speed in km/h
629
+ azm: float # azimuth/heading 0-360
630
+ timestamp: Optional[str] = None
631
+
632
+ # Updated sample input/output for your exact data structure
633
+ sample_input_output = {
634
+ "input": {
635
+ "vehicle_id": "fleet_001",
636
+ "lat": 55.7558,
637
+ "lng": 37.6176,
638
+ "alt": 156.0,
639
+ "spd": 45.5,
640
+ "azm": 85.0,
641
+ "timestamp": "2025-09-13T10:31:18Z"
642
+ },
643
+ "output": {
644
+ "status": "detected",
645
+ "result": {
646
+ "timestamp": "2025-09-13T10:31:18Z",
647
+ "vehicle_id": "fleet_001",
648
+ "anomaly_detected": False,
649
+ "confidence": 0.156,
650
+ "alert_level": "NORMAL",
651
+ "raw_scores": {
652
+ "isolation_forest": 0.045,
653
+ "one_class_svm": 12.34,
654
+ "lstm": 0.234
655
+ },
656
+ "driving_metrics": {
657
+ "speed": 45.5,
658
+ "acceleration": 0.12,
659
+ "lateral_acceleration": 0.08,
660
+ "jerk": 0.05,
661
+ "heading_change_rate": 0.02,
662
+ "overall_risk": 0.089
663
+ },
664
+ "risk_factors": {
665
+ "hard_braking": False,
666
+ "hard_acceleration": False,
667
+ "excessive_speed": False,
668
+ "sharp_turn": False,
669
+ "erratic_steering": False
670
+ }
671
+ }
672
+ }
673
+ }
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ pandas>=1.5.0
3
+ numpy>=1.21.0
4
+ torch>=1.12.0
5
+ scikit-learn>=1.1.0
6
+ plotly>=5.0.0
7
+ scipy>=1.9.0
8
+ joblib>=1.2.0
9
+ aiofiles>=22.0.0
sample_data.csv ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ randomized_id,lat,lng,spd,azm,alt
2
+ VEHICLE001,40.7128,-74.0060,45.5,90.0,100.0
3
+ VEHICLE001,40.7138,-74.0070,48.2,92.0,102.0
4
+ VEHICLE001,40.7148,-74.0080,52.1,95.0,105.0
5
+ VEHICLE001,40.7158,-74.0090,85.3,98.0,108.0
6
+ VEHICLE001,40.7168,-74.0100,127.5,101.0,110.0
7
+ VEHICLE001,40.7178,-74.0110,156.2,105.0,112.0
8
+ VEHICLE001,40.7188,-74.0120,42.8,108.0,115.0
9
+ VEHICLE001,40.7198,-74.0130,38.5,110.0,118.0
10
+ VEHICLE002,40.7500,-73.9800,35.2,180.0,90.0
11
+ VEHICLE002,40.7510,-73.9810,38.1,182.0,92.0
12
+ VEHICLE002,40.7520,-73.9820,41.5,185.0,95.0
13
+ VEHICLE002,40.7530,-73.9830,165.8,188.0,98.0
14
+ VEHICLE002,40.7540,-73.9840,198.2,191.0,100.0
15
+ VEHICLE002,40.7550,-73.9850,43.7,195.0,102.0
16
+ VEHICLE002,40.7560,-73.9860,39.9,198.0,105.0
17
+ VEHICLE003,40.8000,-73.9500,55.0,270.0,200.0
18
+ VEHICLE003,40.8010,-73.9510,58.3,272.0,202.0
19
+ VEHICLE003,40.8020,-73.9520,62.1,275.0,205.0
20
+ VEHICLE003,40.8030,-73.9530,220.5,278.0,208.0
21
+ VEHICLE003,40.8040,-73.9540,245.8,281.0,210.0
22
+ VEHICLE003,40.8050,-73.9550,51.2,285.0,212.0
23
+ VEHICLE003,40.8060,-73.9560,48.7,288.0,215.0
24
+ VEHICLE004,40.6500,-74.1000,25.0,45.0,50.0
25
+ VEHICLE004,40.6510,-74.1010,28.5,47.0,52.0
26
+ VEHICLE004,40.6520,-74.1020,31.2,49.0,55.0
27
+ VEHICLE004,40.6530,-74.1030,34.8,52.0,58.0
28
+ VEHICLE004,40.6540,-74.1040,37.5,55.0,60.0
29
+ VEHICLE004,40.6550,-74.1050,40.1,58.0,62.0
30
+ VEHICLE004,40.6560,-74.1060,42.8,60.0,65.0
31
+ VEHICLE004,40.6570,-74.1070,45.5,62.0,68.0