Spaces:
Sleeping
Sleeping
Commit
·
7058515
0
Parent(s):
init
Browse files- .gitattributes +3 -0
- README.md +183 -0
- __pycache__/batch_production_pred.cpython-311.pyc +0 -0
- __pycache__/gradio_app.cpython-311.pyc +0 -0
- __pycache__/production_predictor.cpython-311.pyc +0 -0
- app.py +10 -0
- batch_production_pred.py +484 -0
- gradio_app.py +536 -0
- launch.bat +12 -0
- models/feature_names.json +1 -0
- models/isolation_forest.pkl +3 -0
- models/lstm_autoencoder.pth +3 -0
- models/lstm_threshold.json +1 -0
- models/manifest.json +25 -0
- models/model_metadata.json +13 -0
- models/one_class_svm.pkl +3 -0
- models/optimization_model.joblib +3 -0
- models/robots.txt +3 -0
- models/scaler.pkl +3 -0
- production_predictor.py +673 -0
- requirements.txt +9 -0
- sample_data.csv +31 -0
.gitattributes
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,183 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🛣️ Vehicle Anomaly Detection System
|
| 2 |
+
|
| 3 |
+
An advanced machine learning-powered anomaly detection system for GPS tracking data with a beautiful Gradio interface.
|
| 4 |
+
|
| 5 |
+
## 🚀 Features
|
| 6 |
+
|
| 7 |
+
- **Multiple ML Models**: Ensemble of Isolation Forest, One-Class SVM, and LSTM Autoencoder
|
| 8 |
+
- **Beautiful UI**: Modern Gradio interface with interactive visualizations
|
| 9 |
+
- **Real-time Processing**: Handles up to 2000 GPS points with detailed analysis
|
| 10 |
+
- **Comprehensive Output**: Point-by-point analysis, risk factors, and JSON export
|
| 11 |
+
- **Interactive Maps**: GPS route visualization with anomaly highlighting
|
| 12 |
+
- **Performance Analytics**: Speed, altitude, and confidence distribution charts
|
| 13 |
+
|
| 14 |
+
## 📊 Processing Performance
|
| 15 |
+
|
| 16 |
+
- **CPU-only processing**: 45-90 seconds for 2000 samples
|
| 17 |
+
- **HuggingFace Spaces ready**: Optimized for cloud deployment
|
| 18 |
+
- **Memory efficient**: Handles large datasets with rolling window processing
|
| 19 |
+
|
| 20 |
+
## 🔧 Installation
|
| 21 |
+
|
| 22 |
+
### Local Installation
|
| 23 |
+
|
| 24 |
+
```bash
|
| 25 |
+
# Clone or download the project
|
| 26 |
+
cd anomaly
|
| 27 |
+
|
| 28 |
+
# Install dependencies
|
| 29 |
+
pip install -r requirements.txt
|
| 30 |
+
|
| 31 |
+
# Run the Gradio app
|
| 32 |
+
python gradio_app.py
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
### HuggingFace Spaces Deployment
|
| 36 |
+
|
| 37 |
+
1. Create a new Space on HuggingFace
|
| 38 |
+
2. Upload all files including the `models/` directory
|
| 39 |
+
3. Set `app_file` to `gradio_app.py`
|
| 40 |
+
4. The app will automatically launch
|
| 41 |
+
|
| 42 |
+
## 📁 Input Format
|
| 43 |
+
|
| 44 |
+
Your CSV file must contain these columns:
|
| 45 |
+
|
| 46 |
+
| Column | Description | Range |
|
| 47 |
+
|--------|-------------|-------|
|
| 48 |
+
| `randomized_id` | Vehicle identifier | Any string |
|
| 49 |
+
| `lat` | Latitude | -90 to 90 |
|
| 50 |
+
| `lng` | Longitude | -180 to 180 |
|
| 51 |
+
| `spd` | Speed (km/h) | 0 to 300 |
|
| 52 |
+
| `azm` | Azimuth/heading (degrees) | 0 to 360 |
|
| 53 |
+
| `alt` | Altitude (meters) | Any number |
|
| 54 |
+
|
| 55 |
+
### Sample Data
|
| 56 |
+
|
| 57 |
+
```csv
|
| 58 |
+
randomized_id,lat,lng,spd,azm,alt
|
| 59 |
+
VEHICLE001,40.7128,-74.0060,45.5,90.0,100.0
|
| 60 |
+
VEHICLE001,40.7138,-74.0070,48.2,92.0,102.0
|
| 61 |
+
VEHICLE002,40.7500,-73.9800,35.2,180.0,90.0
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
**Maximum**: 2000 samples per upload
|
| 65 |
+
**Minimum**: 5 samples required
|
| 66 |
+
|
| 67 |
+
## 🎯 Anomaly Detection
|
| 68 |
+
|
| 69 |
+
The system detects various types of anomalies:
|
| 70 |
+
|
| 71 |
+
### Speed Anomalies
|
| 72 |
+
- Excessive speeding (>120 km/h)
|
| 73 |
+
- Sudden acceleration/deceleration
|
| 74 |
+
- Speed inconsistencies
|
| 75 |
+
|
| 76 |
+
### Movement Anomalies
|
| 77 |
+
- Erratic GPS patterns
|
| 78 |
+
- Sharp turns at high speed
|
| 79 |
+
- Altitude inconsistencies
|
| 80 |
+
|
| 81 |
+
### Behavioral Patterns
|
| 82 |
+
- Route deviations
|
| 83 |
+
- Stop-and-go patterns
|
| 84 |
+
- Unusual driving sequences
|
| 85 |
+
|
| 86 |
+
## 📈 Output Features
|
| 87 |
+
|
| 88 |
+
### 1. Detailed Results
|
| 89 |
+
- Point-by-point analysis
|
| 90 |
+
- Normal vs. anomaly classification
|
| 91 |
+
- Confidence scores and alert levels
|
| 92 |
+
- Risk factor identification
|
| 93 |
+
|
| 94 |
+
### 2. Interactive Visualizations
|
| 95 |
+
- GPS route mapping with anomaly markers
|
| 96 |
+
- Speed and altitude profiles
|
| 97 |
+
- Confidence score distributions
|
| 98 |
+
- Multi-panel analysis dashboard
|
| 99 |
+
|
| 100 |
+
### 3. Summary Statistics
|
| 101 |
+
- Processing performance metrics
|
| 102 |
+
- Overall anomaly rates
|
| 103 |
+
- Alert level distributions
|
| 104 |
+
- Risk factor rankings
|
| 105 |
+
|
| 106 |
+
### 4. JSON Export
|
| 107 |
+
Complete machine-readable results including:
|
| 108 |
+
- All detection scores
|
| 109 |
+
- Driving metrics
|
| 110 |
+
- Risk assessments
|
| 111 |
+
- Timestamps and metadata
|
| 112 |
+
|
| 113 |
+
## 🔬 Technical Details
|
| 114 |
+
|
| 115 |
+
### ML Models Used
|
| 116 |
+
1. **Isolation Forest**: Tree-based anomaly detection
|
| 117 |
+
2. **One-Class SVM**: Support vector-based outlier detection
|
| 118 |
+
3. **LSTM Autoencoder**: Deep learning sequence anomaly detection
|
| 119 |
+
|
| 120 |
+
### Feature Engineering
|
| 121 |
+
- 18 engineered features including:
|
| 122 |
+
- Speed patterns and statistics
|
| 123 |
+
- Acceleration and jerk calculations
|
| 124 |
+
- Angular velocity and curvature
|
| 125 |
+
- Rolling window aggregations
|
| 126 |
+
- Risk scoring algorithms
|
| 127 |
+
|
| 128 |
+
### Performance Optimization
|
| 129 |
+
- Efficient batch processing
|
| 130 |
+
- Memory-optimized feature calculation
|
| 131 |
+
- CPU-friendly model inference
|
| 132 |
+
- Progressive result streaming
|
| 133 |
+
|
| 134 |
+
## 🛡️ Privacy & Security
|
| 135 |
+
|
| 136 |
+
- **Local Processing**: All analysis happens in your environment
|
| 137 |
+
- **No Data Upload**: Your GPS data never leaves the system
|
| 138 |
+
- **Real-time Analysis**: No data storage or logging
|
| 139 |
+
- **Secure Processing**: Industry-standard ML pipeline
|
| 140 |
+
|
| 141 |
+
## 🚀 Deployment Options
|
| 142 |
+
|
| 143 |
+
### Local Development
|
| 144 |
+
```bash
|
| 145 |
+
python gradio_app.py
|
| 146 |
+
# Access at http://localhost:7860
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
### HuggingFace Spaces
|
| 150 |
+
- Perfect for sharing and collaboration
|
| 151 |
+
- No setup required
|
| 152 |
+
- Automatic scaling
|
| 153 |
+
- Public or private deployment
|
| 154 |
+
|
| 155 |
+
### Docker (Optional)
|
| 156 |
+
```dockerfile
|
| 157 |
+
FROM python:3.9-slim
|
| 158 |
+
COPY . /app
|
| 159 |
+
WORKDIR /app
|
| 160 |
+
RUN pip install -r requirements.txt
|
| 161 |
+
CMD ["python", "gradio_app.py"]
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
## 📞 Support
|
| 165 |
+
|
| 166 |
+
For issues or questions:
|
| 167 |
+
1. Check the sample data format
|
| 168 |
+
2. Ensure your CSV has all required columns
|
| 169 |
+
3. Verify data is within expected ranges
|
| 170 |
+
4. Check for missing values or invalid entries
|
| 171 |
+
|
| 172 |
+
## 🔮 Future Enhancements
|
| 173 |
+
|
| 174 |
+
- Real-time streaming support
|
| 175 |
+
- Custom alert thresholds
|
| 176 |
+
- Historical trend analysis
|
| 177 |
+
- Fleet management dashboard
|
| 178 |
+
- Advanced route optimization
|
| 179 |
+
- Multi-vehicle correlation analysis
|
| 180 |
+
|
| 181 |
+
---
|
| 182 |
+
|
| 183 |
+
**Made with ❤️ using Gradio, PyTorch, and Advanced ML**
|
__pycache__/batch_production_pred.cpython-311.pyc
ADDED
|
Binary file (26.4 kB). View file
|
|
|
__pycache__/gradio_app.cpython-311.pyc
ADDED
|
Binary file (30.1 kB). View file
|
|
|
__pycache__/production_predictor.cpython-311.pyc
ADDED
|
Binary file (34.8 kB). View file
|
|
|
app.py
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
HuggingFace Spaces entry point for Vehicle Anomaly Detection System
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
from gradio_app import create_interface
|
| 7 |
+
|
| 8 |
+
if __name__ == "__main__":
|
| 9 |
+
demo = create_interface()
|
| 10 |
+
demo.launch(share=True, server_name="0.0.0.0", server_port=7860, debug=True)
|
batch_production_pred.py
ADDED
|
@@ -0,0 +1,484 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import numpy as np
|
| 2 |
+
import pandas as pd
|
| 3 |
+
from typing import List, Dict, Optional, Tuple, Any
|
| 4 |
+
from datetime import datetime, timedelta
|
| 5 |
+
import logging
|
| 6 |
+
from production_predictor import ProductionAnomalyDetector, AnomalyResult, GPSPoint
|
| 7 |
+
import torch
|
| 8 |
+
logger = logging.getLogger(__name__)
|
| 9 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 10 |
+
class BatchAnomalyDetector(ProductionAnomalyDetector):
|
| 11 |
+
"""
|
| 12 |
+
Extended ProductionAnomalyDetector with batch processing capabilities
|
| 13 |
+
Processes data as list of lists: [[id, lat, lng, azm, spd, alt], ...]
|
| 14 |
+
"""
|
| 15 |
+
|
| 16 |
+
def __init__(self, model_dir: str, config: Dict = None):
|
| 17 |
+
super().__init__(model_dir, config)
|
| 18 |
+
self.batch_results = []
|
| 19 |
+
|
| 20 |
+
def process_batch_list_of_lists(self,
|
| 21 |
+
data: List[List],
|
| 22 |
+
column_order: List[str] = None,
|
| 23 |
+
sort_by_vehicle: bool = True,
|
| 24 |
+
generate_timestamps: bool = True) -> Dict[str, Any]:
|
| 25 |
+
"""
|
| 26 |
+
Process batch data as list of lists
|
| 27 |
+
|
| 28 |
+
Args:
|
| 29 |
+
data: List of lists in format [[id, lat, lng, azm, spd, alt], ...]
|
| 30 |
+
column_order: Order of columns if different from default
|
| 31 |
+
sort_by_vehicle: Whether to sort by vehicle_id for proper sequence
|
| 32 |
+
generate_timestamps: Whether to generate timestamps automatically
|
| 33 |
+
|
| 34 |
+
Returns:
|
| 35 |
+
Dictionary with batch processing results
|
| 36 |
+
"""
|
| 37 |
+
|
| 38 |
+
if column_order is None:
|
| 39 |
+
column_order = ['vehicle_id', 'lat', 'lng', 'azm', 'spd', 'alt']
|
| 40 |
+
|
| 41 |
+
print(f"🔄 Processing batch of {len(data)} GPS points...")
|
| 42 |
+
|
| 43 |
+
# Convert list of lists to DataFrame
|
| 44 |
+
df = pd.DataFrame(data, columns=column_order)
|
| 45 |
+
|
| 46 |
+
# Rename to match your training format
|
| 47 |
+
column_mapping = {
|
| 48 |
+
'vehicle_id': 'randomized_id',
|
| 49 |
+
'azm': 'azm',
|
| 50 |
+
'spd': 'spd',
|
| 51 |
+
'alt': 'alt',
|
| 52 |
+
'lat': 'lat',
|
| 53 |
+
'lng': 'lng'
|
| 54 |
+
}
|
| 55 |
+
|
| 56 |
+
# Apply column mapping if needed
|
| 57 |
+
for old_col, new_col in column_mapping.items():
|
| 58 |
+
if old_col in df.columns and old_col != new_col:
|
| 59 |
+
df = df.rename(columns={old_col: new_col})
|
| 60 |
+
|
| 61 |
+
# Ensure we have the right columns
|
| 62 |
+
required_columns = ['randomized_id', 'lat', 'lng', 'alt', 'spd', 'azm']
|
| 63 |
+
missing_columns = [col for col in required_columns if col not in df.columns]
|
| 64 |
+
|
| 65 |
+
if missing_columns:
|
| 66 |
+
raise ValueError(f"Missing required columns: {missing_columns}")
|
| 67 |
+
|
| 68 |
+
# Sort by vehicle and add sequence if requested
|
| 69 |
+
if sort_by_vehicle:
|
| 70 |
+
df = df.sort_values(['randomized_id', 'lat', 'lng']).reset_index(drop=True)
|
| 71 |
+
|
| 72 |
+
# Generate timestamps if requested
|
| 73 |
+
if generate_timestamps:
|
| 74 |
+
df['timestamp'] = self._generate_timestamps(df)
|
| 75 |
+
|
| 76 |
+
# Process batch
|
| 77 |
+
return self._process_dataframe_batch(df)
|
| 78 |
+
|
| 79 |
+
def process_batch_by_vehicle(self,
|
| 80 |
+
data: List[List],
|
| 81 |
+
column_order: List[str] = None,
|
| 82 |
+
time_interval_seconds: int = 2) -> Dict[str, List[AnomalyResult]]:
|
| 83 |
+
"""
|
| 84 |
+
Process batch data vehicle by vehicle to maintain proper sequence
|
| 85 |
+
|
| 86 |
+
Args:
|
| 87 |
+
data: List of lists format
|
| 88 |
+
column_order: Column order specification
|
| 89 |
+
time_interval_seconds: Time interval between GPS points
|
| 90 |
+
|
| 91 |
+
Returns:
|
| 92 |
+
Dictionary with vehicle_id as key and list of results as value
|
| 93 |
+
"""
|
| 94 |
+
|
| 95 |
+
if column_order is None:
|
| 96 |
+
column_order = ['vehicle_id', 'lat', 'lng', 'azm', 'spd', 'alt']
|
| 97 |
+
|
| 98 |
+
# Convert to DataFrame
|
| 99 |
+
df = pd.DataFrame(data, columns=column_order)
|
| 100 |
+
|
| 101 |
+
# Group by vehicle
|
| 102 |
+
vehicle_results = {}
|
| 103 |
+
total_anomalies = 0
|
| 104 |
+
|
| 105 |
+
print(f"🚛 Processing {df['vehicle_id'].nunique()} vehicles with {len(df)} total points...")
|
| 106 |
+
|
| 107 |
+
for vehicle_id in df['vehicle_id'].unique():
|
| 108 |
+
vehicle_data = df[df['vehicle_id'] == vehicle_id].copy()
|
| 109 |
+
vehicle_data = vehicle_data.sort_values(['lat', 'lng']).reset_index(drop=True)
|
| 110 |
+
|
| 111 |
+
print(f"\n📍 Processing vehicle: {vehicle_id} ({len(vehicle_data)} points)")
|
| 112 |
+
|
| 113 |
+
# Clear vehicle buffer to start fresh
|
| 114 |
+
if vehicle_id in self.vehicle_buffers:
|
| 115 |
+
del self.vehicle_buffers[vehicle_id]
|
| 116 |
+
|
| 117 |
+
vehicle_results[vehicle_id] = []
|
| 118 |
+
vehicle_anomalies = 0
|
| 119 |
+
|
| 120 |
+
# Process points sequentially for this vehicle
|
| 121 |
+
for idx, row in vehicle_data.iterrows():
|
| 122 |
+
timestamp = datetime.now() + timedelta(seconds=idx * time_interval_seconds)
|
| 123 |
+
|
| 124 |
+
gps_point = GPSPoint(
|
| 125 |
+
vehicle_id=vehicle_id,
|
| 126 |
+
lat=row['lat'],
|
| 127 |
+
lng=row['lng'],
|
| 128 |
+
alt=row.get('alt', 0.0),
|
| 129 |
+
spd=row.get('spd', 0.0),
|
| 130 |
+
azm=row.get('azm', 0.0),
|
| 131 |
+
timestamp=timestamp.isoformat()
|
| 132 |
+
)
|
| 133 |
+
|
| 134 |
+
result = self.process_gps_point(gps_point)
|
| 135 |
+
|
| 136 |
+
if result:
|
| 137 |
+
vehicle_results[vehicle_id].append(result)
|
| 138 |
+
if result.anomaly_detected:
|
| 139 |
+
vehicle_anomalies += 1
|
| 140 |
+
total_anomalies += 1
|
| 141 |
+
|
| 142 |
+
# Print anomaly details
|
| 143 |
+
print(f" 🚨 Point {idx+1}: {result.alert_level} "
|
| 144 |
+
f"(Speed: {result.driving_metrics['speed']:.1f} km/h, "
|
| 145 |
+
f"Conf: {result.confidence:.3f})")
|
| 146 |
+
print(f" Risk factors: {result.risk_factors}")
|
| 147 |
+
|
| 148 |
+
detection_rate = vehicle_anomalies / len(vehicle_results[vehicle_id]) if vehicle_results[vehicle_id] else 0
|
| 149 |
+
print(f" 📊 Vehicle summary: {vehicle_anomalies} anomalies out of {len(vehicle_results[vehicle_id])} detections ({detection_rate:.1%})")
|
| 150 |
+
|
| 151 |
+
print(f"\n🎯 Batch Summary:")
|
| 152 |
+
print(f" Total vehicles: {len(vehicle_results)}")
|
| 153 |
+
print(f" Total points processed: {len(df)}")
|
| 154 |
+
print(f" Total anomalies detected: {total_anomalies}")
|
| 155 |
+
print(f" Overall anomaly rate: {total_anomalies/len(df):.1%}")
|
| 156 |
+
|
| 157 |
+
return vehicle_results
|
| 158 |
+
|
| 159 |
+
def process_realtime_stream(self, data_stream: List[List],
|
| 160 |
+
column_order: List[str] = None,
|
| 161 |
+
delay_seconds: float = 2.0,
|
| 162 |
+
callback_function = None) -> List[AnomalyResult]:
|
| 163 |
+
"""
|
| 164 |
+
Simulate real-time processing of list-of-lists data
|
| 165 |
+
|
| 166 |
+
Args:
|
| 167 |
+
data_stream: List of lists to process as real-time stream
|
| 168 |
+
column_order: Column order
|
| 169 |
+
delay_seconds: Delay between processing points (simulate real-time)
|
| 170 |
+
callback_function: Function to call when anomaly is detected
|
| 171 |
+
|
| 172 |
+
Returns:
|
| 173 |
+
List of all detection results
|
| 174 |
+
"""
|
| 175 |
+
|
| 176 |
+
import time
|
| 177 |
+
|
| 178 |
+
if column_order is None:
|
| 179 |
+
column_order = ['vehicle_id', 'lat', 'lng', 'azm', 'spd', 'alt']
|
| 180 |
+
|
| 181 |
+
print(f"🔴 Starting real-time stream simulation with {len(data_stream)} points...")
|
| 182 |
+
print(f"⏱️ Processing delay: {delay_seconds} seconds between points")
|
| 183 |
+
|
| 184 |
+
all_results = []
|
| 185 |
+
anomaly_count = 0
|
| 186 |
+
|
| 187 |
+
for i, point_data in enumerate(data_stream):
|
| 188 |
+
# Convert list to GPSPoint
|
| 189 |
+
point_dict = dict(zip(column_order, point_data))
|
| 190 |
+
|
| 191 |
+
gps_point = GPSPoint(
|
| 192 |
+
vehicle_id=point_dict['vehicle_id'],
|
| 193 |
+
lat=point_dict['lat'],
|
| 194 |
+
lng=point_dict['lng'],
|
| 195 |
+
alt=point_dict.get('alt', 0.0),
|
| 196 |
+
spd=point_dict.get('spd', 0.0),
|
| 197 |
+
azm=point_dict.get('azm', 0.0),
|
| 198 |
+
timestamp=datetime.now().isoformat()
|
| 199 |
+
)
|
| 200 |
+
|
| 201 |
+
# Process point
|
| 202 |
+
result = self.process_gps_point(gps_point)
|
| 203 |
+
|
| 204 |
+
if result:
|
| 205 |
+
all_results.append(result)
|
| 206 |
+
|
| 207 |
+
# Print status
|
| 208 |
+
status_icon = "🟢" if result.alert_level == "NORMAL" else "🟡" if result.alert_level in ["LOW", "MEDIUM"] else "🔴"
|
| 209 |
+
print(f"{status_icon} Point {i+1:3d}: {result.vehicle_id:12s} | "
|
| 210 |
+
f"{result.alert_level:8s} | Speed: {result.driving_metrics['speed']:5.1f} km/h | "
|
| 211 |
+
f"Conf: {result.confidence:.3f}")
|
| 212 |
+
|
| 213 |
+
if result.anomaly_detected:
|
| 214 |
+
anomaly_count += 1
|
| 215 |
+
print(f" 🚨 ANOMALY DETECTED! {result.risk_factors}")
|
| 216 |
+
|
| 217 |
+
# Call callback function if provided
|
| 218 |
+
if callback_function:
|
| 219 |
+
callback_function(result, gps_point)
|
| 220 |
+
else:
|
| 221 |
+
print(f"⏳ Point {i+1:3d}: {point_dict['vehicle_id']:12s} | Building buffer...")
|
| 222 |
+
|
| 223 |
+
# Simulate real-time delay
|
| 224 |
+
if i < len(data_stream) - 1: # Don't delay after last point
|
| 225 |
+
time.sleep(delay_seconds)
|
| 226 |
+
|
| 227 |
+
print(f"\n📊 Stream Complete:")
|
| 228 |
+
print(f" Points processed: {len(data_stream)}")
|
| 229 |
+
print(f" Detections made: {len(all_results)}")
|
| 230 |
+
print(f" Anomalies found: {anomaly_count}")
|
| 231 |
+
print(f" Anomaly rate: {anomaly_count/len(all_results)*100:.1f}%" if all_results else " No detections made")
|
| 232 |
+
|
| 233 |
+
return all_results
|
| 234 |
+
|
| 235 |
+
def _generate_timestamps(self, df: pd.DataFrame) -> List[str]:
|
| 236 |
+
"""Generate realistic timestamps for GPS data"""
|
| 237 |
+
base_time = datetime.now()
|
| 238 |
+
timestamps = []
|
| 239 |
+
|
| 240 |
+
for vehicle_id in df['randomized_id'].unique():
|
| 241 |
+
vehicle_mask = df['randomized_id'] == vehicle_id
|
| 242 |
+
vehicle_count = vehicle_mask.sum()
|
| 243 |
+
|
| 244 |
+
# Generate timestamps for this vehicle (2-second intervals)
|
| 245 |
+
for i in range(vehicle_count):
|
| 246 |
+
timestamp = base_time + timedelta(seconds=i * 2)
|
| 247 |
+
timestamps.append(timestamp.isoformat())
|
| 248 |
+
|
| 249 |
+
return timestamps
|
| 250 |
+
|
| 251 |
+
def _process_dataframe_batch(self, df: pd.DataFrame) -> Dict[str, Any]:
|
| 252 |
+
"""Process DataFrame using the existing feature pipeline"""
|
| 253 |
+
|
| 254 |
+
# Use your exact feature engineering pipeline
|
| 255 |
+
features_df = self._calculate_features_exact_pipeline(df)
|
| 256 |
+
|
| 257 |
+
if len(features_df) == 0:
|
| 258 |
+
return {
|
| 259 |
+
"status": "error",
|
| 260 |
+
"message": "No features could be calculated",
|
| 261 |
+
"processed": 0,
|
| 262 |
+
"anomalies": 0
|
| 263 |
+
}
|
| 264 |
+
|
| 265 |
+
# Scale features
|
| 266 |
+
features_scaled = self.scaler.transform(features_df)
|
| 267 |
+
|
| 268 |
+
# Get anomaly scores for all points
|
| 269 |
+
anomaly_results = []
|
| 270 |
+
|
| 271 |
+
print("🔍 Running anomaly detection on all points...")
|
| 272 |
+
|
| 273 |
+
for i in range(len(features_scaled)):
|
| 274 |
+
point_scaled = features_scaled[i:i+1]
|
| 275 |
+
|
| 276 |
+
# Get scores from all models
|
| 277 |
+
scores = {}
|
| 278 |
+
|
| 279 |
+
# Isolation Forest
|
| 280 |
+
if self.isolation_forest:
|
| 281 |
+
scores['isolation_forest'] = float(self.isolation_forest.decision_function(point_scaled)[0])
|
| 282 |
+
|
| 283 |
+
# One-Class SVM
|
| 284 |
+
if self.one_class_svm:
|
| 285 |
+
scores['one_class_svm'] = float(self.one_class_svm.decision_function(point_scaled)[0])
|
| 286 |
+
|
| 287 |
+
# LSTM (only if we have enough sequence data)
|
| 288 |
+
if self.lstm_autoencoder and i >= self.config['lstm_sequence_length'] - 1:
|
| 289 |
+
try:
|
| 290 |
+
sequence_start = max(0, i - self.config['lstm_sequence_length'] + 1)
|
| 291 |
+
sequence_features = features_scaled[sequence_start:i+1]
|
| 292 |
+
|
| 293 |
+
if len(sequence_features) == self.config['lstm_sequence_length']:
|
| 294 |
+
sequence_tensor = torch.FloatTensor(sequence_features).unsqueeze(0).to(device)
|
| 295 |
+
|
| 296 |
+
with torch.no_grad():
|
| 297 |
+
reconstructed = self.lstm_autoencoder(sequence_tensor)
|
| 298 |
+
reconstruction_error = torch.mean((sequence_tensor - reconstructed) ** 2).item()
|
| 299 |
+
scores['lstm'] = float(reconstruction_error)
|
| 300 |
+
except:
|
| 301 |
+
scores['lstm'] = 0.0
|
| 302 |
+
|
| 303 |
+
# Calculate ensemble score
|
| 304 |
+
ensemble_score = self._calculate_ensemble_score(scores)
|
| 305 |
+
alert_level = self._get_alert_level(ensemble_score)
|
| 306 |
+
|
| 307 |
+
# Extract metrics
|
| 308 |
+
feature_row = features_df.iloc[i]
|
| 309 |
+
driving_metrics = self._extract_driving_metrics_from_features(feature_row)
|
| 310 |
+
risk_factors = self._extract_risk_factors_from_features(feature_row)
|
| 311 |
+
|
| 312 |
+
anomaly_results.append({
|
| 313 |
+
'index': i,
|
| 314 |
+
'vehicle_id': df.iloc[i]['randomized_id'],
|
| 315 |
+
'anomaly_detected': ensemble_score > self.config['alert_threshold'],
|
| 316 |
+
'confidence': ensemble_score,
|
| 317 |
+
'alert_level': alert_level,
|
| 318 |
+
'raw_scores': scores,
|
| 319 |
+
'driving_metrics': driving_metrics,
|
| 320 |
+
'risk_factors': risk_factors
|
| 321 |
+
})
|
| 322 |
+
|
| 323 |
+
# Generate summary
|
| 324 |
+
total_anomalies = sum(1 for r in anomaly_results if r['anomaly_detected'])
|
| 325 |
+
|
| 326 |
+
return {
|
| 327 |
+
"status": "completed",
|
| 328 |
+
"processed": len(anomaly_results),
|
| 329 |
+
"anomalies": total_anomalies,
|
| 330 |
+
"anomaly_rate": total_anomalies / len(anomaly_results) if anomaly_results else 0,
|
| 331 |
+
"results": anomaly_results,
|
| 332 |
+
"summary": {
|
| 333 |
+
"total_vehicles": df['randomized_id'].nunique(),
|
| 334 |
+
"total_points": len(df),
|
| 335 |
+
"detection_ready_points": len(anomaly_results),
|
| 336 |
+
"anomalies_by_level": {
|
| 337 |
+
level: sum(1 for r in anomaly_results if r['alert_level'] == level)
|
| 338 |
+
for level in ['NORMAL', 'LOW', 'MEDIUM', 'HIGH', 'CRITICAL']
|
| 339 |
+
}
|
| 340 |
+
}
|
| 341 |
+
}
|
| 342 |
+
|
| 343 |
+
# Example usage functions
|
| 344 |
+
def example_list_of_lists_usage():
|
| 345 |
+
"""Example of how to use the batch processor with list of lists"""
|
| 346 |
+
|
| 347 |
+
print("🔄 Example: Processing List of Lists Data")
|
| 348 |
+
print("=" * 50)
|
| 349 |
+
|
| 350 |
+
# Initialize batch detector
|
| 351 |
+
detector = BatchAnomalyDetector("/kaggle/working/anomaly_analysis_pytorch_fixed/models")
|
| 352 |
+
|
| 353 |
+
# Sample data as list of lists: [vehicle_id, lat, lng, azm, spd, alt]
|
| 354 |
+
sample_data = [
|
| 355 |
+
# Normal driving for vehicle_001
|
| 356 |
+
["vehicle_001", 55.7558, 37.6176, 90.0, 45.0, 156.0],
|
| 357 |
+
["vehicle_001", 55.7559, 37.6177, 92.0, 47.0, 157.0],
|
| 358 |
+
["vehicle_001", 55.7560, 37.6178, 94.0, 46.0, 158.0],
|
| 359 |
+
["vehicle_001", 55.7561, 37.6179, 96.0, 48.0, 159.0],
|
| 360 |
+
["vehicle_001", 55.7562, 37.6180, 98.0, 49.0, 160.0],
|
| 361 |
+
|
| 362 |
+
# Aggressive driving for vehicle_002
|
| 363 |
+
["vehicle_002", 55.7600, 37.6200, 180.0, 70.0, 150.0],
|
| 364 |
+
["vehicle_002", 55.7601, 37.6201, 182.0, 125.0, 151.0], # Speeding
|
| 365 |
+
["vehicle_002", 55.7602, 37.6202, 184.0, 15.0, 152.0], # Hard braking
|
| 366 |
+
["vehicle_002", 55.7603, 37.6203, 250.0, 55.0, 153.0], # Sharp turn
|
| 367 |
+
|
| 368 |
+
# Mixed behavior for vehicle_003
|
| 369 |
+
["vehicle_003", 55.7700, 37.6300, 45.0, 40.0, 145.0],
|
| 370 |
+
["vehicle_003", 55.7701, 37.6301, 47.0, 42.0, 146.0],
|
| 371 |
+
["vehicle_003", 55.7702, 37.6302, 49.0, 110.0, 147.0], # Speed violation
|
| 372 |
+
["vehicle_003", 55.7703, 37.6303, 51.0, 43.0, 148.0],
|
| 373 |
+
]
|
| 374 |
+
|
| 375 |
+
print(f"Processing {len(sample_data)} GPS points from {len(set(row[0] for row in sample_data))} vehicles...")
|
| 376 |
+
|
| 377 |
+
# Method 1: Process as batch
|
| 378 |
+
print("\n📊 Method 1: Batch Processing")
|
| 379 |
+
batch_results = detector.process_batch_list_of_lists(sample_data)
|
| 380 |
+
|
| 381 |
+
print(f"Batch Results:")
|
| 382 |
+
print(f" Status: {batch_results['status']}")
|
| 383 |
+
print(f" Points processed: {batch_results['processed']}")
|
| 384 |
+
print(f" Anomalies detected: {batch_results['anomalies']}")
|
| 385 |
+
print(f" Anomaly rate: {batch_results['anomaly_rate']:.1%}")
|
| 386 |
+
|
| 387 |
+
# Method 2: Process by vehicle
|
| 388 |
+
print("\n🚛 Method 2: Vehicle-by-Vehicle Processing")
|
| 389 |
+
vehicle_results = detector.process_batch_by_vehicle(sample_data)
|
| 390 |
+
|
| 391 |
+
for vehicle_id, results in vehicle_results.items():
|
| 392 |
+
anomaly_count = sum(1 for r in results if r.anomaly_detected)
|
| 393 |
+
print(f" {vehicle_id}: {anomaly_count} anomalies out of {len(results)} detections")
|
| 394 |
+
|
| 395 |
+
# Method 3: Real-time simulation
|
| 396 |
+
print("\n🔴 Method 3: Real-time Stream Simulation (first 8 points)")
|
| 397 |
+
|
| 398 |
+
def anomaly_callback(result, gps_point):
|
| 399 |
+
"""Callback function for when anomaly is detected"""
|
| 400 |
+
print(f" 📧 ALERT SENT: {result.vehicle_id} - {result.alert_level}")
|
| 401 |
+
|
| 402 |
+
stream_results = detector.process_realtime_stream(
|
| 403 |
+
sample_data[:8], # First 8 points
|
| 404 |
+
delay_seconds=0.5, # Faster for demo
|
| 405 |
+
callback_function=anomaly_callback
|
| 406 |
+
)
|
| 407 |
+
|
| 408 |
+
def load_from_csv_example():
|
| 409 |
+
"""Example of loading data from CSV and converting to list of lists"""
|
| 410 |
+
|
| 411 |
+
print("\n📁 Example: Loading from CSV")
|
| 412 |
+
print("=" * 50)
|
| 413 |
+
|
| 414 |
+
# Simulate CSV loading (you would use pd.read_csv('your_file.csv'))
|
| 415 |
+
csv_data = """vehicle_id,lat,lng,azm,spd,alt
|
| 416 |
+
vehicle_001,55.7558,37.6176,90.0,45.0,156.0
|
| 417 |
+
vehicle_001,55.7559,37.6177,92.0,47.0,157.0
|
| 418 |
+
vehicle_002,55.7600,37.6200,180.0,125.0,150.0
|
| 419 |
+
vehicle_002,55.7601,37.6201,182.0,15.0,151.0"""
|
| 420 |
+
|
| 421 |
+
# Convert CSV to list of lists
|
| 422 |
+
from io import StringIO
|
| 423 |
+
df = pd.read_csv(StringIO(csv_data))
|
| 424 |
+
|
| 425 |
+
# Convert DataFrame to list of lists
|
| 426 |
+
data_as_lists = df.values.tolist()
|
| 427 |
+
|
| 428 |
+
print(f"Loaded {len(data_as_lists)} rows from CSV")
|
| 429 |
+
print(f"Column order: {df.columns.tolist()}")
|
| 430 |
+
print(f"Sample data: {data_as_lists[0]}")
|
| 431 |
+
|
| 432 |
+
# Process with detector
|
| 433 |
+
detector = BatchAnomalyDetector("/kaggle/working/anomaly_analysis_pytorch_fixed/models")
|
| 434 |
+
results = detector.process_batch_list_of_lists(
|
| 435 |
+
data_as_lists,
|
| 436 |
+
column_order=df.columns.tolist()
|
| 437 |
+
)
|
| 438 |
+
|
| 439 |
+
print(f"Processing complete: {results['anomalies']} anomalies detected")
|
| 440 |
+
|
| 441 |
+
def large_dataset_example():
|
| 442 |
+
"""Example for processing large datasets efficiently"""
|
| 443 |
+
|
| 444 |
+
print("\n🔢 Example: Large Dataset Processing")
|
| 445 |
+
print("=" * 50)
|
| 446 |
+
|
| 447 |
+
# Simulate large dataset
|
| 448 |
+
np.random.seed(42)
|
| 449 |
+
large_data = []
|
| 450 |
+
|
| 451 |
+
vehicles = [f"vehicle_{i:03d}" for i in range(1, 11)] # 10 vehicles
|
| 452 |
+
|
| 453 |
+
for vehicle in vehicles:
|
| 454 |
+
for point in range(100): # 100 points per vehicle
|
| 455 |
+
lat = 55.7500 + np.random.uniform(-0.01, 0.01)
|
| 456 |
+
lng = 37.6000 + np.random.uniform(-0.01, 0.01)
|
| 457 |
+
azm = np.random.uniform(0, 360)
|
| 458 |
+
spd = np.random.uniform(20, 80) if np.random.random() > 0.1 else np.random.uniform(90, 140) # 10% aggressive
|
| 459 |
+
alt = 150 + np.random.uniform(-20, 20)
|
| 460 |
+
|
| 461 |
+
large_data.append([vehicle, lat, lng, azm, spd, alt])
|
| 462 |
+
|
| 463 |
+
print(f"Generated large dataset: {len(large_data)} points from {len(vehicles)} vehicles")
|
| 464 |
+
|
| 465 |
+
# Process efficiently
|
| 466 |
+
detector = BatchAnomalyDetector("/kaggle/working/anomaly_analysis_pytorch_fixed/models")
|
| 467 |
+
|
| 468 |
+
# Process in chunks for memory efficiency
|
| 469 |
+
chunk_size = 500
|
| 470 |
+
total_anomalies = 0
|
| 471 |
+
|
| 472 |
+
for i in range(0, len(large_data), chunk_size):
|
| 473 |
+
chunk = large_data[i:i + chunk_size]
|
| 474 |
+
print(f"Processing chunk {i//chunk_size + 1}: points {i+1}-{i+len(chunk)}")
|
| 475 |
+
|
| 476 |
+
results = detector.process_batch_list_of_lists(chunk)
|
| 477 |
+
total_anomalies += results['anomalies']
|
| 478 |
+
|
| 479 |
+
print(f" Chunk anomalies: {results['anomalies']}")
|
| 480 |
+
|
| 481 |
+
print(f"\nLarge dataset complete:")
|
| 482 |
+
print(f" Total points: {len(large_data)}")
|
| 483 |
+
print(f" Total anomalies: {total_anomalies}")
|
| 484 |
+
print(f" Overall anomaly rate: {total_anomalies/len(large_data):.1%}")
|
gradio_app.py
ADDED
|
@@ -0,0 +1,536 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import pandas as pd
|
| 3 |
+
import numpy as np
|
| 4 |
+
import json
|
| 5 |
+
import time
|
| 6 |
+
from datetime import datetime
|
| 7 |
+
from typing import Dict, List, Tuple, Any
|
| 8 |
+
import plotly.express as px
|
| 9 |
+
import plotly.graph_objects as go
|
| 10 |
+
from plotly.subplots import make_subplots
|
| 11 |
+
import warnings
|
| 12 |
+
warnings.filterwarnings("ignore")
|
| 13 |
+
|
| 14 |
+
# Import your ML solution
|
| 15 |
+
from batch_production_pred import BatchAnomalyDetector
|
| 16 |
+
from production_predictor import AnomalyResult
|
| 17 |
+
|
| 18 |
+
class AnomalyDetectionGradioApp:
|
| 19 |
+
def __init__(self, model_dir: str = "./models"):
|
| 20 |
+
"""Initialize the Gradio app with ML models"""
|
| 21 |
+
self.model_dir = model_dir
|
| 22 |
+
self.detector = None
|
| 23 |
+
self.load_models()
|
| 24 |
+
|
| 25 |
+
def load_models(self):
|
| 26 |
+
"""Load the ML models"""
|
| 27 |
+
try:
|
| 28 |
+
self.detector = BatchAnomalyDetector(self.model_dir)
|
| 29 |
+
print("✅ Models loaded successfully!")
|
| 30 |
+
except Exception as e:
|
| 31 |
+
print(f"❌ Error loading models: {e}")
|
| 32 |
+
self.detector = None
|
| 33 |
+
|
| 34 |
+
def validate_csv(self, file_path: str) -> Tuple[bool, str, pd.DataFrame]:
|
| 35 |
+
"""Validate uploaded CSV file"""
|
| 36 |
+
try:
|
| 37 |
+
# Read CSV
|
| 38 |
+
df = pd.read_csv(file_path)
|
| 39 |
+
|
| 40 |
+
# Check required columns
|
| 41 |
+
required_cols = ['randomized_id', 'lat', 'lng', 'spd', 'azm', 'alt']
|
| 42 |
+
missing_cols = [col for col in required_cols if col not in df.columns]
|
| 43 |
+
|
| 44 |
+
if missing_cols:
|
| 45 |
+
return False, f"❌ Missing required columns: {', '.join(missing_cols)}", None
|
| 46 |
+
|
| 47 |
+
# Check sample count
|
| 48 |
+
if len(df) > 2000:
|
| 49 |
+
return False, f"❌ Too many samples ({len(df)}). Maximum allowed: 2000", None
|
| 50 |
+
|
| 51 |
+
if len(df) < 5:
|
| 52 |
+
return False, f"❌ Too few samples ({len(df)}). Minimum required: 5", None
|
| 53 |
+
|
| 54 |
+
# Check data types and ranges
|
| 55 |
+
try:
|
| 56 |
+
df['lat'] = pd.to_numeric(df['lat'])
|
| 57 |
+
df['lng'] = pd.to_numeric(df['lng'])
|
| 58 |
+
df['spd'] = pd.to_numeric(df['spd'])
|
| 59 |
+
df['azm'] = pd.to_numeric(df['azm'])
|
| 60 |
+
df['alt'] = pd.to_numeric(df['alt'])
|
| 61 |
+
|
| 62 |
+
# Basic range validation
|
| 63 |
+
if not df['lat'].between(-90, 90).all():
|
| 64 |
+
return False, "❌ Latitude values must be between -90 and 90", None
|
| 65 |
+
if not df['lng'].between(-180, 180).all():
|
| 66 |
+
return False, "❌ Longitude values must be between -180 and 180", None
|
| 67 |
+
if not df['spd'].between(0, 300).all():
|
| 68 |
+
return False, "❌ Speed values must be between 0 and 300 km/h", None
|
| 69 |
+
if not df['azm'].between(0, 360).all():
|
| 70 |
+
return False, "❌ Azimuth values must be between 0 and 360 degrees", None
|
| 71 |
+
|
| 72 |
+
except Exception as e:
|
| 73 |
+
return False, f"❌ Data type error: {str(e)}", None
|
| 74 |
+
|
| 75 |
+
return True, f"✅ Valid CSV: {len(df)} samples, {df['randomized_id'].nunique()} vehicles", df
|
| 76 |
+
|
| 77 |
+
except Exception as e:
|
| 78 |
+
return False, f"❌ Error reading CSV: {str(e)}", None
|
| 79 |
+
|
| 80 |
+
def process_data(self, file_path: str, progress=gr.Progress()) -> Tuple[str, str, str, str]:
|
| 81 |
+
"""Process the uploaded CSV and return results"""
|
| 82 |
+
if not self.detector:
|
| 83 |
+
return "❌ Models not loaded", "", "", ""
|
| 84 |
+
|
| 85 |
+
# Validate CSV
|
| 86 |
+
is_valid, message, df = self.validate_csv(file_path)
|
| 87 |
+
if not is_valid:
|
| 88 |
+
return message, "", "", ""
|
| 89 |
+
|
| 90 |
+
progress(0.1, desc="Validating data...")
|
| 91 |
+
|
| 92 |
+
try:
|
| 93 |
+
# Convert DataFrame to list of lists format
|
| 94 |
+
data_list = df[['randomized_id', 'lat', 'lng', 'azm', 'spd', 'alt']].values.tolist()
|
| 95 |
+
column_order = ['vehicle_id', 'lat', 'lng', 'azm', 'spd', 'alt']
|
| 96 |
+
|
| 97 |
+
progress(0.2, desc="Starting anomaly detection...")
|
| 98 |
+
|
| 99 |
+
# Process batch by vehicle
|
| 100 |
+
start_time = time.time()
|
| 101 |
+
vehicle_results = self.detector.process_batch_by_vehicle(
|
| 102 |
+
data_list,
|
| 103 |
+
column_order=column_order
|
| 104 |
+
)
|
| 105 |
+
processing_time = time.time() - start_time
|
| 106 |
+
|
| 107 |
+
progress(0.8, desc="Generating detailed results...")
|
| 108 |
+
|
| 109 |
+
# Generate detailed output
|
| 110 |
+
detailed_results = self.generate_detailed_results(vehicle_results, df)
|
| 111 |
+
summary_stats = self.generate_summary_stats(vehicle_results, processing_time)
|
| 112 |
+
visualization = self.create_visualization(vehicle_results, df)
|
| 113 |
+
json_output = self.generate_json_output(vehicle_results)
|
| 114 |
+
|
| 115 |
+
progress(1.0, desc="Complete!")
|
| 116 |
+
|
| 117 |
+
return detailed_results, summary_stats, visualization, json_output
|
| 118 |
+
|
| 119 |
+
except Exception as e:
|
| 120 |
+
return f"❌ Processing error: {str(e)}", "", "", ""
|
| 121 |
+
|
| 122 |
+
def generate_detailed_results(self, vehicle_results: Dict, original_df: pd.DataFrame) -> str:
|
| 123 |
+
"""Generate detailed point-by-point analysis"""
|
| 124 |
+
output_lines = ["# 🔍 Detailed Anomaly Detection Results\n"]
|
| 125 |
+
|
| 126 |
+
total_points = 0
|
| 127 |
+
total_anomalies = 0
|
| 128 |
+
|
| 129 |
+
for vehicle_id, results in vehicle_results.items():
|
| 130 |
+
if not results:
|
| 131 |
+
continue
|
| 132 |
+
|
| 133 |
+
output_lines.append(f"## 🚗 Vehicle: {vehicle_id}")
|
| 134 |
+
output_lines.append(f"**Points analyzed:** {len(results)}\n")
|
| 135 |
+
|
| 136 |
+
vehicle_anomalies = 0
|
| 137 |
+
|
| 138 |
+
for i, result in enumerate(results, 1):
|
| 139 |
+
total_points += 1
|
| 140 |
+
|
| 141 |
+
if result.anomaly_detected:
|
| 142 |
+
total_anomalies += 1
|
| 143 |
+
vehicle_anomalies += 1
|
| 144 |
+
|
| 145 |
+
# Get original data point
|
| 146 |
+
vehicle_data = original_df[original_df['randomized_id'] == vehicle_id].iloc[i-1]
|
| 147 |
+
|
| 148 |
+
# Anomaly details
|
| 149 |
+
output_lines.append(f"### 🚨 Point {i}: **ANOMALY DETECTED!**")
|
| 150 |
+
output_lines.append(f"- **Alert Level:** {result.alert_level}")
|
| 151 |
+
output_lines.append(f"- **Confidence:** {result.confidence:.3f}")
|
| 152 |
+
output_lines.append(f"- **Location:** ({vehicle_data['lat']:.6f}, {vehicle_data['lng']:.6f})")
|
| 153 |
+
output_lines.append(f"- **Speed:** {result.driving_metrics.get('speed', 0):.1f} km/h")
|
| 154 |
+
output_lines.append(f"- **Altitude:** {vehicle_data['alt']:.1f} m")
|
| 155 |
+
output_lines.append(f"- **Heading:** {vehicle_data['azm']:.1f}°")
|
| 156 |
+
|
| 157 |
+
# Risk factors
|
| 158 |
+
risk_factors = [k for k, v in result.risk_factors.items() if v]
|
| 159 |
+
if risk_factors:
|
| 160 |
+
output_lines.append(f"- **Risk Factors:** {', '.join(risk_factors)}")
|
| 161 |
+
|
| 162 |
+
# Model scores
|
| 163 |
+
output_lines.append(f"- **Model Scores:**")
|
| 164 |
+
for model, score in result.raw_scores.items():
|
| 165 |
+
output_lines.append(f" - {model}: {score:.3f}")
|
| 166 |
+
|
| 167 |
+
output_lines.append("")
|
| 168 |
+
else:
|
| 169 |
+
# Normal point (abbreviated)
|
| 170 |
+
if i <= 5 or i % 10 == 0: # Show first 5 and every 10th normal point
|
| 171 |
+
output_lines.append(f"**Point {i}:** ✅ Normal (confidence: {result.confidence:.3f})")
|
| 172 |
+
|
| 173 |
+
# Vehicle summary
|
| 174 |
+
detection_rate = vehicle_anomalies / len(results) if results else 0
|
| 175 |
+
output_lines.append(f"\n**Vehicle Summary:** {vehicle_anomalies} anomalies out of {len(results)} points ({detection_rate:.1%})\n")
|
| 176 |
+
output_lines.append("---\n")
|
| 177 |
+
|
| 178 |
+
# Overall summary
|
| 179 |
+
overall_rate = total_anomalies / total_points if total_points > 0 else 0
|
| 180 |
+
output_lines.append(f"## 📊 Overall Summary")
|
| 181 |
+
output_lines.append(f"- **Total Points:** {total_points}")
|
| 182 |
+
output_lines.append(f"- **Total Anomalies:** {total_anomalies}")
|
| 183 |
+
output_lines.append(f"- **Detection Rate:** {overall_rate:.1%}")
|
| 184 |
+
|
| 185 |
+
return "\n".join(output_lines)
|
| 186 |
+
|
| 187 |
+
def generate_summary_stats(self, vehicle_results: Dict, processing_time: float) -> str:
|
| 188 |
+
"""Generate summary statistics"""
|
| 189 |
+
total_vehicles = len(vehicle_results)
|
| 190 |
+
total_points = sum(len(results) for results in vehicle_results.values())
|
| 191 |
+
total_anomalies = sum(sum(1 for r in results if r.anomaly_detected)
|
| 192 |
+
for results in vehicle_results.values())
|
| 193 |
+
|
| 194 |
+
# Alert level distribution
|
| 195 |
+
alert_levels = {}
|
| 196 |
+
for results in vehicle_results.values():
|
| 197 |
+
for result in results:
|
| 198 |
+
if result.anomaly_detected:
|
| 199 |
+
level = result.alert_level
|
| 200 |
+
alert_levels[level] = alert_levels.get(level, 0) + 1
|
| 201 |
+
|
| 202 |
+
# Risk factor analysis
|
| 203 |
+
risk_factors = {}
|
| 204 |
+
for results in vehicle_results.values():
|
| 205 |
+
for result in results:
|
| 206 |
+
if result.anomaly_detected:
|
| 207 |
+
for factor, present in result.risk_factors.items():
|
| 208 |
+
if present:
|
| 209 |
+
risk_factors[factor] = risk_factors.get(factor, 0) + 1
|
| 210 |
+
|
| 211 |
+
output = f"""
|
| 212 |
+
# 📈 Processing Summary
|
| 213 |
+
|
| 214 |
+
## ⚡ Performance Metrics
|
| 215 |
+
- **Processing Time:** {processing_time:.2f} seconds
|
| 216 |
+
- **Points per Second:** {total_points/processing_time:.1f}
|
| 217 |
+
- **Average Time per Point:** {1000*processing_time/total_points:.1f} ms
|
| 218 |
+
|
| 219 |
+
## 📊 Detection Statistics
|
| 220 |
+
- **Total Vehicles:** {total_vehicles}
|
| 221 |
+
- **Total GPS Points:** {total_points}
|
| 222 |
+
- **Anomalies Detected:** {total_anomalies}
|
| 223 |
+
- **Overall Anomaly Rate:** {100*total_anomalies/total_points:.2f}%
|
| 224 |
+
|
| 225 |
+
## 🚨 Alert Level Distribution
|
| 226 |
+
"""
|
| 227 |
+
|
| 228 |
+
for level, count in sorted(alert_levels.items()):
|
| 229 |
+
percentage = 100 * count / total_anomalies if total_anomalies > 0 else 0
|
| 230 |
+
output += f"- **{level}:** {count} ({percentage:.1f}%)\n"
|
| 231 |
+
|
| 232 |
+
if risk_factors:
|
| 233 |
+
output += "\n## ⚠️ Top Risk Factors\n"
|
| 234 |
+
sorted_risks = sorted(risk_factors.items(), key=lambda x: x[1], reverse=True)[:5]
|
| 235 |
+
for factor, count in sorted_risks:
|
| 236 |
+
percentage = 100 * count / total_anomalies if total_anomalies > 0 else 0
|
| 237 |
+
output += f"- **{factor}:** {count} occurrences ({percentage:.1f}%)\n"
|
| 238 |
+
|
| 239 |
+
return output
|
| 240 |
+
|
| 241 |
+
def create_visualization(self, vehicle_results: Dict, original_df: pd.DataFrame) -> gr.Plot:
|
| 242 |
+
"""Create interactive visualization"""
|
| 243 |
+
# Prepare data for plotting
|
| 244 |
+
plot_data = []
|
| 245 |
+
|
| 246 |
+
for vehicle_id, results in vehicle_results.items():
|
| 247 |
+
vehicle_df = original_df[original_df['randomized_id'] == vehicle_id].copy()
|
| 248 |
+
|
| 249 |
+
for i, result in enumerate(results):
|
| 250 |
+
if i < len(vehicle_df):
|
| 251 |
+
row = vehicle_df.iloc[i]
|
| 252 |
+
plot_data.append({
|
| 253 |
+
'vehicle_id': vehicle_id,
|
| 254 |
+
'lat': row['lat'],
|
| 255 |
+
'lng': row['lng'],
|
| 256 |
+
'spd': row['spd'],
|
| 257 |
+
'alt': row['alt'],
|
| 258 |
+
'azm': row['azm'],
|
| 259 |
+
'anomaly': result.anomaly_detected,
|
| 260 |
+
'confidence': result.confidence,
|
| 261 |
+
'alert_level': result.alert_level if result.anomaly_detected else 'Normal'
|
| 262 |
+
})
|
| 263 |
+
|
| 264 |
+
plot_df = pd.DataFrame(plot_data)
|
| 265 |
+
|
| 266 |
+
if len(plot_df) == 0:
|
| 267 |
+
return gr.Plot(value=go.Figure().add_annotation(text="No data to plot"))
|
| 268 |
+
|
| 269 |
+
# Create subplots
|
| 270 |
+
fig = make_subplots(
|
| 271 |
+
rows=2, cols=2,
|
| 272 |
+
subplot_titles=('GPS Route with Anomalies', 'Speed Profile',
|
| 273 |
+
'Altitude Profile', 'Confidence Distribution'),
|
| 274 |
+
specs=[[{"type": "scattermapbox"}, {"type": "scatter"}],
|
| 275 |
+
[{"type": "scatter"}, {"type": "histogram"}]]
|
| 276 |
+
)
|
| 277 |
+
|
| 278 |
+
# GPS Route Map
|
| 279 |
+
normal_points = plot_df[~plot_df['anomaly']]
|
| 280 |
+
anomaly_points = plot_df[plot_df['anomaly']]
|
| 281 |
+
|
| 282 |
+
if len(normal_points) > 0:
|
| 283 |
+
fig.add_trace(
|
| 284 |
+
go.Scattermapbox(
|
| 285 |
+
lat=normal_points['lat'],
|
| 286 |
+
lon=normal_points['lng'],
|
| 287 |
+
mode='markers',
|
| 288 |
+
marker=dict(size=8, color='green'),
|
| 289 |
+
text=normal_points['vehicle_id'],
|
| 290 |
+
name='Normal',
|
| 291 |
+
hovertemplate='<b>%{text}</b><br>Lat: %{lat}<br>Lon: %{lon}<extra></extra>'
|
| 292 |
+
),
|
| 293 |
+
row=1, col=1
|
| 294 |
+
)
|
| 295 |
+
|
| 296 |
+
if len(anomaly_points) > 0:
|
| 297 |
+
fig.add_trace(
|
| 298 |
+
go.Scattermapbox(
|
| 299 |
+
lat=anomaly_points['lat'],
|
| 300 |
+
lon=anomaly_points['lng'],
|
| 301 |
+
mode='markers',
|
| 302 |
+
marker=dict(size=12, color='red', symbol='diamond'),
|
| 303 |
+
text=anomaly_points['alert_level'],
|
| 304 |
+
name='Anomaly',
|
| 305 |
+
hovertemplate='<b>%{text}</b><br>Lat: %{lat}<br>Lon: %{lon}<extra></extra>'
|
| 306 |
+
),
|
| 307 |
+
row=1, col=1
|
| 308 |
+
)
|
| 309 |
+
|
| 310 |
+
# Speed Profile
|
| 311 |
+
fig.add_trace(
|
| 312 |
+
go.Scatter(
|
| 313 |
+
x=list(range(len(plot_df))),
|
| 314 |
+
y=plot_df['spd'],
|
| 315 |
+
mode='lines+markers',
|
| 316 |
+
marker=dict(color=plot_df['anomaly'].map({True: 'red', False: 'blue'})),
|
| 317 |
+
name='Speed',
|
| 318 |
+
hovertemplate='Point: %{x}<br>Speed: %{y} km/h<extra></extra>'
|
| 319 |
+
),
|
| 320 |
+
row=1, col=2
|
| 321 |
+
)
|
| 322 |
+
|
| 323 |
+
# Altitude Profile
|
| 324 |
+
fig.add_trace(
|
| 325 |
+
go.Scatter(
|
| 326 |
+
x=list(range(len(plot_df))),
|
| 327 |
+
y=plot_df['alt'],
|
| 328 |
+
mode='lines+markers',
|
| 329 |
+
marker=dict(color=plot_df['anomaly'].map({True: 'red', False: 'green'})),
|
| 330 |
+
name='Altitude',
|
| 331 |
+
hovertemplate='Point: %{x}<br>Altitude: %{y} m<extra></extra>'
|
| 332 |
+
),
|
| 333 |
+
row=2, col=1
|
| 334 |
+
)
|
| 335 |
+
|
| 336 |
+
# Confidence Distribution
|
| 337 |
+
fig.add_trace(
|
| 338 |
+
go.Histogram(
|
| 339 |
+
x=plot_df['confidence'],
|
| 340 |
+
nbinsx=20,
|
| 341 |
+
name='Confidence',
|
| 342 |
+
marker_color='lightblue'
|
| 343 |
+
),
|
| 344 |
+
row=2, col=2
|
| 345 |
+
)
|
| 346 |
+
|
| 347 |
+
# Update layout
|
| 348 |
+
fig.update_layout(
|
| 349 |
+
mapbox=dict(
|
| 350 |
+
style="open-street-map",
|
| 351 |
+
center=dict(lat=plot_df['lat'].mean(), lon=plot_df['lng'].mean()),
|
| 352 |
+
zoom=10
|
| 353 |
+
),
|
| 354 |
+
height=800,
|
| 355 |
+
showlegend=True,
|
| 356 |
+
title_text="��️ Vehicle Anomaly Detection Analysis"
|
| 357 |
+
)
|
| 358 |
+
|
| 359 |
+
fig.update_xaxes(title_text="Point Index", row=1, col=2)
|
| 360 |
+
fig.update_yaxes(title_text="Speed (km/h)", row=1, col=2)
|
| 361 |
+
fig.update_xaxes(title_text="Point Index", row=2, col=1)
|
| 362 |
+
fig.update_yaxes(title_text="Altitude (m)", row=2, col=1)
|
| 363 |
+
fig.update_xaxes(title_text="Confidence Score", row=2, col=2)
|
| 364 |
+
fig.update_yaxes(title_text="Count", row=2, col=2)
|
| 365 |
+
|
| 366 |
+
return gr.Plot(value=fig)
|
| 367 |
+
|
| 368 |
+
def generate_json_output(self, vehicle_results: Dict) -> str:
|
| 369 |
+
"""Generate JSON output of all results"""
|
| 370 |
+
json_data = {
|
| 371 |
+
"detection_results": {},
|
| 372 |
+
"summary": {
|
| 373 |
+
"total_vehicles": len(vehicle_results),
|
| 374 |
+
"total_points": sum(len(results) for results in vehicle_results.values()),
|
| 375 |
+
"total_anomalies": sum(sum(1 for r in results if r.anomaly_detected)
|
| 376 |
+
for results in vehicle_results.values()),
|
| 377 |
+
"timestamp": datetime.now().isoformat()
|
| 378 |
+
}
|
| 379 |
+
}
|
| 380 |
+
|
| 381 |
+
for vehicle_id, results in vehicle_results.items():
|
| 382 |
+
json_data["detection_results"][vehicle_id] = []
|
| 383 |
+
|
| 384 |
+
for i, result in enumerate(results, 1):
|
| 385 |
+
result_dict = {
|
| 386 |
+
"point_number": i,
|
| 387 |
+
"anomaly_detected": result.anomaly_detected,
|
| 388 |
+
"confidence": round(result.confidence, 4),
|
| 389 |
+
"alert_level": result.alert_level,
|
| 390 |
+
"timestamp": result.timestamp,
|
| 391 |
+
"driving_metrics": result.driving_metrics,
|
| 392 |
+
"risk_factors": result.risk_factors,
|
| 393 |
+
"raw_scores": {k: round(v, 4) for k, v in result.raw_scores.items()}
|
| 394 |
+
}
|
| 395 |
+
json_data["detection_results"][vehicle_id].append(result_dict)
|
| 396 |
+
|
| 397 |
+
return json.dumps(json_data, indent=2)
|
| 398 |
+
|
| 399 |
+
# Initialize the app
|
| 400 |
+
app = AnomalyDetectionGradioApp()
|
| 401 |
+
|
| 402 |
+
def process_csv_file(file):
|
| 403 |
+
"""Wrapper function for Gradio interface"""
|
| 404 |
+
if file is None:
|
| 405 |
+
return "Please upload a CSV file", "", "", ""
|
| 406 |
+
|
| 407 |
+
return app.process_data(file.name)
|
| 408 |
+
|
| 409 |
+
# Create the Gradio interface
|
| 410 |
+
def create_interface():
|
| 411 |
+
with gr.Blocks(
|
| 412 |
+
theme=gr.themes.Soft(),
|
| 413 |
+
title="🛣️ Vehicle Anomaly Detection System",
|
| 414 |
+
css="""
|
| 415 |
+
.gradio-container {
|
| 416 |
+
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
| 417 |
+
}
|
| 418 |
+
.main-header {
|
| 419 |
+
text-align: center;
|
| 420 |
+
background: linear-gradient(45deg, #1e3c72, #2a5298);
|
| 421 |
+
color: white;
|
| 422 |
+
padding: 2rem;
|
| 423 |
+
border-radius: 10px;
|
| 424 |
+
margin-bottom: 2rem;
|
| 425 |
+
}
|
| 426 |
+
.upload-area {
|
| 427 |
+
border: 2px dashed #4CAF50;
|
| 428 |
+
border-radius: 10px;
|
| 429 |
+
padding: 2rem;
|
| 430 |
+
text-align: center;
|
| 431 |
+
background-color: #f8f9fa;
|
| 432 |
+
}
|
| 433 |
+
"""
|
| 434 |
+
) as demo:
|
| 435 |
+
|
| 436 |
+
# Header
|
| 437 |
+
gr.HTML("""
|
| 438 |
+
<div class="main-header">
|
| 439 |
+
<h1>🛣️ Vehicle Anomaly Detection System</h1>
|
| 440 |
+
<p>Advanced ML-powered anomaly detection for GPS tracking data</p>
|
| 441 |
+
<p><strong>Upload your CSV with columns:</strong> randomized_id, lat, lng, spd, azm, alt (max 2000 samples)</p>
|
| 442 |
+
</div>
|
| 443 |
+
""")
|
| 444 |
+
|
| 445 |
+
with gr.Row():
|
| 446 |
+
with gr.Column(scale=1):
|
| 447 |
+
# File upload
|
| 448 |
+
gr.HTML('<div class="upload-area">')
|
| 449 |
+
file_upload = gr.File(
|
| 450 |
+
label="📁 Upload GPS Data CSV",
|
| 451 |
+
file_types=[".csv"],
|
| 452 |
+
type="filepath"
|
| 453 |
+
)
|
| 454 |
+
gr.HTML('</div>')
|
| 455 |
+
|
| 456 |
+
# Process button
|
| 457 |
+
process_btn = gr.Button(
|
| 458 |
+
"🚀 Analyze Anomalies",
|
| 459 |
+
variant="primary",
|
| 460 |
+
size="lg"
|
| 461 |
+
)
|
| 462 |
+
|
| 463 |
+
# Sample data info
|
| 464 |
+
gr.HTML("""
|
| 465 |
+
<div style="margin-top: 1rem; padding: 1rem; background-color: #e8f4fd; border-radius: 5px;">
|
| 466 |
+
<h4>📋 Expected CSV Format:</h4>
|
| 467 |
+
<code>
|
| 468 |
+
randomized_id,lat,lng,spd,azm,alt<br>
|
| 469 |
+
VEHICLE001,40.7128,-74.0060,45.5,90.0,100.0<br>
|
| 470 |
+
VEHICLE001,40.7138,-74.0070,48.2,92.0,102.0<br>
|
| 471 |
+
...
|
| 472 |
+
</code>
|
| 473 |
+
<ul style="margin-top: 1rem;">
|
| 474 |
+
<li><strong>randomized_id:</strong> Vehicle identifier</li>
|
| 475 |
+
<li><strong>lat:</strong> Latitude (-90 to 90)</li>
|
| 476 |
+
<li><strong>lng:</strong> Longitude (-180 to 180)</li>
|
| 477 |
+
<li><strong>spd:</strong> Speed in km/h (0-300)</li>
|
| 478 |
+
<li><strong>azm:</strong> Azimuth/heading (0-360°)</li>
|
| 479 |
+
<li><strong>alt:</strong> Altitude in meters</li>
|
| 480 |
+
</ul>
|
| 481 |
+
</div>
|
| 482 |
+
""")
|
| 483 |
+
|
| 484 |
+
# Results tabs
|
| 485 |
+
with gr.Tabs():
|
| 486 |
+
with gr.Tab("📋 Detailed Results"):
|
| 487 |
+
detailed_output = gr.Markdown(
|
| 488 |
+
value="Upload a CSV file and click 'Analyze Anomalies' to see detailed results...",
|
| 489 |
+
elem_classes=["detailed-results"]
|
| 490 |
+
)
|
| 491 |
+
|
| 492 |
+
with gr.Tab("📊 Summary & Stats"):
|
| 493 |
+
summary_output = gr.Markdown(
|
| 494 |
+
value="Processing summary will appear here...",
|
| 495 |
+
elem_classes=["summary-stats"]
|
| 496 |
+
)
|
| 497 |
+
|
| 498 |
+
with gr.Tab("📈 Visualizations"):
|
| 499 |
+
viz_output = gr.Plot(
|
| 500 |
+
label="Interactive Analysis Charts"
|
| 501 |
+
)
|
| 502 |
+
|
| 503 |
+
with gr.Tab("💾 JSON Export"):
|
| 504 |
+
json_output = gr.Code(
|
| 505 |
+
language="json",
|
| 506 |
+
label="Complete Results JSON",
|
| 507 |
+
value="JSON results will appear here..."
|
| 508 |
+
)
|
| 509 |
+
|
| 510 |
+
# Connect the processing
|
| 511 |
+
process_btn.click(
|
| 512 |
+
fn=process_csv_file,
|
| 513 |
+
inputs=[file_upload],
|
| 514 |
+
outputs=[detailed_output, summary_output, viz_output, json_output],
|
| 515 |
+
show_progress=True
|
| 516 |
+
)
|
| 517 |
+
|
| 518 |
+
# Footer
|
| 519 |
+
gr.HTML("""
|
| 520 |
+
<div style="text-align: center; margin-top: 2rem; padding: 1rem; background-color: #f1f3f4; border-radius: 5px;">
|
| 521 |
+
<p>🔬 <strong>ML Models:</strong> Isolation Forest + One-Class SVM + LSTM Autoencoder</p>
|
| 522 |
+
<p>⚡ <strong>Processing:</strong> ~45-90 seconds for 2000 samples on CPU</p>
|
| 523 |
+
<p>🛡️ <strong>Privacy:</strong> All processing happens locally - your data never leaves this environment</p>
|
| 524 |
+
</div>
|
| 525 |
+
""")
|
| 526 |
+
|
| 527 |
+
return demo
|
| 528 |
+
|
| 529 |
+
if __name__ == "__main__":
|
| 530 |
+
demo = create_interface()
|
| 531 |
+
demo.launch(
|
| 532 |
+
server_name="0.0.0.0",
|
| 533 |
+
server_port=7860,
|
| 534 |
+
share=True,
|
| 535 |
+
show_error=True
|
| 536 |
+
)
|
launch.bat
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
@echo off
|
| 2 |
+
echo 🛣️ Vehicle Anomaly Detection System
|
| 3 |
+
echo =====================================
|
| 4 |
+
echo.
|
| 5 |
+
echo Installing dependencies...
|
| 6 |
+
pip install -r requirements.txt
|
| 7 |
+
echo.
|
| 8 |
+
echo Starting Gradio application...
|
| 9 |
+
echo Access the interface at: http://localhost:7860
|
| 10 |
+
echo.
|
| 11 |
+
python gradio_app.py
|
| 12 |
+
pause
|
models/feature_names.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"feature_names": ["spd", "acceleration", "jerk", "angular_velocity", "lateral_acceleration", "heading_change_rate", "curvature", "overall_risk", "speed_std_3", "speed_std_5", "speed_std_10", "accel_std_3", "accel_std_5", "accel_std_10", "acceleration_risk", "jerk_risk", "lateral_risk", "speed_risk"]}
|
models/isolation_forest.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8b4ee70e6ebf4a9e9d9ecd6b2ce0897303f12513078ca4870030d554ab155fdd
|
| 3 |
+
size 1710078
|
models/lstm_autoencoder.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:69846219391308a4980bf5475668bb3e24e387e666522134a5de13f49f493398
|
| 3 |
+
size 500332
|
models/lstm_threshold.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"lstm_threshold": 2.9153685569763184}
|
models/manifest.json
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"short_name": "React App",
|
| 3 |
+
"name": "Create React App Sample",
|
| 4 |
+
"icons": [
|
| 5 |
+
{
|
| 6 |
+
"src": "favicon.ico",
|
| 7 |
+
"sizes": "64x64 32x32 24x24 16x16",
|
| 8 |
+
"type": "image/x-icon"
|
| 9 |
+
},
|
| 10 |
+
{
|
| 11 |
+
"src": "logo192.png",
|
| 12 |
+
"type": "image/png",
|
| 13 |
+
"sizes": "192x192"
|
| 14 |
+
},
|
| 15 |
+
{
|
| 16 |
+
"src": "logo512.png",
|
| 17 |
+
"type": "image/png",
|
| 18 |
+
"sizes": "512x512"
|
| 19 |
+
}
|
| 20 |
+
],
|
| 21 |
+
"start_url": ".",
|
| 22 |
+
"display": "standalone",
|
| 23 |
+
"theme_color": "#000000",
|
| 24 |
+
"background_color": "#ffffff"
|
| 25 |
+
}
|
models/model_metadata.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"models_saved": [
|
| 3 |
+
"Isolation Forest",
|
| 4 |
+
"One-Class SVM",
|
| 5 |
+
"LSTM Autoencoder",
|
| 6 |
+
"LSTM Threshold",
|
| 7 |
+
"Scaler",
|
| 8 |
+
"Feature Names"
|
| 9 |
+
],
|
| 10 |
+
"save_timestamp": "2025-09-13T15:15:49.010561",
|
| 11 |
+
"device_used": "cuda",
|
| 12 |
+
"total_samples": 118166
|
| 13 |
+
}
|
models/one_class_svm.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:89b7add1680af2a043dc510902dfb31c64cb74ece7bd4d08175edec9cb117161
|
| 3 |
+
size 412575
|
models/optimization_model.joblib
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ab47250d98f86d183ed95f5b6aa8d4017597d0d510be8d4fb43abd623d4ae75c
|
| 3 |
+
size 409969
|
models/robots.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# https://www.robotstxt.org/robotstxt.html
|
| 2 |
+
User-agent: *
|
| 3 |
+
Disallow:
|
models/scaler.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1a84f6f969094f40ff8b1c5b6cfb81dc63b3689d8f0902aed115b57f96e33f47
|
| 3 |
+
size 1319
|
production_predictor.py
ADDED
|
@@ -0,0 +1,673 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import json
|
| 3 |
+
import time
|
| 4 |
+
import logging
|
| 5 |
+
import numpy as np
|
| 6 |
+
import pandas as pd
|
| 7 |
+
import torch
|
| 8 |
+
import joblib
|
| 9 |
+
from datetime import datetime
|
| 10 |
+
from collections import deque
|
| 11 |
+
from typing import Dict, List, Optional, Any
|
| 12 |
+
import asyncio
|
| 13 |
+
import aiofiles
|
| 14 |
+
from dataclasses import dataclass, asdict
|
| 15 |
+
from pathlib import Path
|
| 16 |
+
from scipy.signal import savgol_filter
|
| 17 |
+
|
| 18 |
+
# Set up logging
|
| 19 |
+
logging.basicConfig(level=logging.INFO)
|
| 20 |
+
logger = logging.getLogger(__name__)
|
| 21 |
+
|
| 22 |
+
# Set device
|
| 23 |
+
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
| 24 |
+
|
| 25 |
+
@dataclass
|
| 26 |
+
class GPSPoint:
|
| 27 |
+
"""GPS data point from tracker - matches your dataset structure"""
|
| 28 |
+
vehicle_id: str # This will be our randomized_id
|
| 29 |
+
lat: float
|
| 30 |
+
lng: float
|
| 31 |
+
alt: float
|
| 32 |
+
spd: float # speed in km/h
|
| 33 |
+
azm: float # azimuth/heading 0-360
|
| 34 |
+
timestamp: str = None # Added for real-time tracking
|
| 35 |
+
|
| 36 |
+
@classmethod
|
| 37 |
+
def from_tracker_data(cls, tracker_data: Dict) -> 'GPSPoint':
|
| 38 |
+
"""Convert from real GPS tracker format to our dataset format"""
|
| 39 |
+
return cls(
|
| 40 |
+
vehicle_id=tracker_data.get('vehicle_id', tracker_data.get('device_id')),
|
| 41 |
+
lat=tracker_data['lat'],
|
| 42 |
+
lng=tracker_data['lng'],
|
| 43 |
+
alt=tracker_data.get('alt', tracker_data.get('altitude', 0.0)),
|
| 44 |
+
spd=tracker_data.get('spd', tracker_data.get('speed', 0.0)),
|
| 45 |
+
azm=tracker_data.get('azm', tracker_data.get('heading', 0.0)),
|
| 46 |
+
timestamp=tracker_data.get('timestamp', datetime.now().isoformat())
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
def to_dataset_format(self) -> Dict:
|
| 50 |
+
"""Convert to the format expected by your trained model"""
|
| 51 |
+
return {
|
| 52 |
+
'randomized_id': self.vehicle_id,
|
| 53 |
+
'lat': self.lat,
|
| 54 |
+
'lng': self.lng,
|
| 55 |
+
'alt': self.alt,
|
| 56 |
+
'spd': self.spd,
|
| 57 |
+
'azm': self.azm
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
@dataclass
|
| 61 |
+
class AnomalyResult:
|
| 62 |
+
"""Anomaly detection result"""
|
| 63 |
+
timestamp: str
|
| 64 |
+
vehicle_id: str
|
| 65 |
+
anomaly_detected: bool
|
| 66 |
+
confidence: float
|
| 67 |
+
alert_level: str
|
| 68 |
+
raw_scores: Dict[str, float]
|
| 69 |
+
driving_metrics: Dict[str, float]
|
| 70 |
+
risk_factors: Dict[str, bool]
|
| 71 |
+
|
| 72 |
+
def to_dict(self) -> Dict:
|
| 73 |
+
return asdict(self)
|
| 74 |
+
|
| 75 |
+
# Import the LSTM model from your training code
|
| 76 |
+
class LSTMAutoencoder(torch.nn.Module):
|
| 77 |
+
"""LSTM Autoencoder - same as your training code"""
|
| 78 |
+
|
| 79 |
+
def __init__(self, input_dim, hidden_dim=64, latent_dim=10, num_layers=2, sequence_length=20):
|
| 80 |
+
super(LSTMAutoencoder, self).__init__()
|
| 81 |
+
self.input_dim = input_dim
|
| 82 |
+
self.hidden_dim = hidden_dim
|
| 83 |
+
self.latent_dim = latent_dim
|
| 84 |
+
self.num_layers = num_layers
|
| 85 |
+
self.sequence_length = sequence_length
|
| 86 |
+
|
| 87 |
+
# Encoder
|
| 88 |
+
self.encoder_lstm = torch.nn.LSTM(
|
| 89 |
+
input_dim, hidden_dim, num_layers,
|
| 90 |
+
batch_first=True, dropout=0.2 if num_layers > 1 else 0
|
| 91 |
+
)
|
| 92 |
+
self.encoder_fc = torch.nn.Linear(hidden_dim, latent_dim)
|
| 93 |
+
|
| 94 |
+
# Decoder
|
| 95 |
+
self.decoder_fc = torch.nn.Linear(latent_dim, hidden_dim)
|
| 96 |
+
self.decoder_lstm = torch.nn.LSTM(
|
| 97 |
+
hidden_dim, hidden_dim, num_layers,
|
| 98 |
+
batch_first=True, dropout=0.2 if num_layers > 1 else 0
|
| 99 |
+
)
|
| 100 |
+
self.output_projection = torch.nn.Linear(hidden_dim, input_dim)
|
| 101 |
+
|
| 102 |
+
self.dropout = torch.nn.Dropout(0.2)
|
| 103 |
+
|
| 104 |
+
def encode(self, x):
|
| 105 |
+
lstm_out, (hidden, cell) = self.encoder_lstm(x)
|
| 106 |
+
encoded = self.encoder_fc(hidden[-1])
|
| 107 |
+
return encoded
|
| 108 |
+
|
| 109 |
+
def decode(self, encoded):
|
| 110 |
+
batch_size = encoded.size(0)
|
| 111 |
+
decoded = self.decoder_fc(encoded)
|
| 112 |
+
decoded = decoded.unsqueeze(1).repeat(1, self.sequence_length, 1)
|
| 113 |
+
lstm_out, _ = self.decoder_lstm(decoded)
|
| 114 |
+
output = self.output_projection(lstm_out)
|
| 115 |
+
return output
|
| 116 |
+
|
| 117 |
+
def forward(self, x):
|
| 118 |
+
encoded = self.encode(x)
|
| 119 |
+
decoded = self.decode(encoded)
|
| 120 |
+
return decoded
|
| 121 |
+
|
| 122 |
+
class ProductionAnomalyDetector:
|
| 123 |
+
"""
|
| 124 |
+
Production-ready driving anomaly detection system
|
| 125 |
+
Works with your exact dataset format: randomized_id,lat,lng,alt,spd,azm
|
| 126 |
+
"""
|
| 127 |
+
|
| 128 |
+
def __init__(self, model_dir: str, config: Dict = None):
|
| 129 |
+
"""
|
| 130 |
+
Initialize with pre-trained models
|
| 131 |
+
"""
|
| 132 |
+
self.model_dir = Path(model_dir)
|
| 133 |
+
self.config = config or self._default_config()
|
| 134 |
+
|
| 135 |
+
# Model components
|
| 136 |
+
self.scaler = None
|
| 137 |
+
self.isolation_forest = None
|
| 138 |
+
self.one_class_svm = None
|
| 139 |
+
self.lstm_autoencoder = None
|
| 140 |
+
self.lstm_threshold = None
|
| 141 |
+
|
| 142 |
+
# Vehicle buffers for real-time processing
|
| 143 |
+
self.vehicle_buffers = {} # vehicle_id -> deque of GPS points
|
| 144 |
+
self.buffer_size = self.config['buffer_size']
|
| 145 |
+
|
| 146 |
+
# Normalization parameters
|
| 147 |
+
self.if_min = None
|
| 148 |
+
self.if_max = None
|
| 149 |
+
self.svm_min = None
|
| 150 |
+
self.svm_max = None
|
| 151 |
+
|
| 152 |
+
# Load models
|
| 153 |
+
self._load_models()
|
| 154 |
+
|
| 155 |
+
logger.info(f"ProductionAnomalyDetector initialized with models from {model_dir}")
|
| 156 |
+
logger.info(f"Using device: {device}")
|
| 157 |
+
|
| 158 |
+
def _default_config(self) -> Dict:
|
| 159 |
+
"""Default configuration matching your training setup"""
|
| 160 |
+
return {
|
| 161 |
+
'buffer_size': 20,
|
| 162 |
+
'min_points_for_detection': 5,
|
| 163 |
+
'lstm_sequence_length': 15,
|
| 164 |
+
'alert_threshold': 0.3,
|
| 165 |
+
'weights': {
|
| 166 |
+
'isolation_forest': 0.35,
|
| 167 |
+
'one_class_svm': 0.30,
|
| 168 |
+
'lstm': 0.35
|
| 169 |
+
}
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
def _load_models(self):
|
| 173 |
+
"""Load all pre-trained models"""
|
| 174 |
+
try:
|
| 175 |
+
# Load scaler (required)
|
| 176 |
+
scaler_path = self.model_dir / 'scaler.pkl'
|
| 177 |
+
if scaler_path.exists():
|
| 178 |
+
self.scaler = joblib.load(scaler_path)
|
| 179 |
+
logger.info("✓ Feature scaler loaded")
|
| 180 |
+
else:
|
| 181 |
+
raise FileNotFoundError(f"Feature scaler not found: {scaler_path}")
|
| 182 |
+
|
| 183 |
+
# Load Isolation Forest
|
| 184 |
+
if_path = self.model_dir / 'isolation_forest.pkl'
|
| 185 |
+
if if_path.exists():
|
| 186 |
+
self.isolation_forest = joblib.load(if_path)
|
| 187 |
+
logger.info("✓ Isolation Forest loaded")
|
| 188 |
+
|
| 189 |
+
# Load One-Class SVM
|
| 190 |
+
svm_path = self.model_dir / 'one_class_svm.pkl'
|
| 191 |
+
if svm_path.exists():
|
| 192 |
+
self.one_class_svm = joblib.load(svm_path)
|
| 193 |
+
logger.info("✓ One-Class SVM loaded")
|
| 194 |
+
|
| 195 |
+
# Load LSTM Autoencoder
|
| 196 |
+
lstm_path = self.model_dir / 'lstm_autoencoder.pth'
|
| 197 |
+
if lstm_path.exists():
|
| 198 |
+
checkpoint = torch.load(lstm_path, map_location=device)
|
| 199 |
+
lstm_config = checkpoint["model_config"]
|
| 200 |
+
self.lstm_autoencoder = LSTMAutoencoder(**lstm_config).to(device)
|
| 201 |
+
|
| 202 |
+
self.lstm_autoencoder.load_state_dict(checkpoint["model_state_dict"])
|
| 203 |
+
self.lstm_autoencoder.eval()
|
| 204 |
+
logger.info("✓ LSTM Autoencoder loaded")
|
| 205 |
+
self.lstm_threshold = 2.9153685569763184 # fallback threshold
|
| 206 |
+
logger.info(f"✓ LSTM threshold: {self.lstm_threshold}")
|
| 207 |
+
|
| 208 |
+
# Load normalization parameters
|
| 209 |
+
norm_path = self.model_dir / 'normalization_params.json'
|
| 210 |
+
if norm_path.exists():
|
| 211 |
+
with open(norm_path, 'r') as f:
|
| 212 |
+
norm_params = json.load(f)
|
| 213 |
+
self.if_min = norm_params.get('if_min', -0.2400)
|
| 214 |
+
self.if_max = norm_params.get('if_max', 0.1680)
|
| 215 |
+
self.svm_min = norm_params.get('svm_min', -381.6356)
|
| 216 |
+
self.svm_max = norm_params.get('svm_max', 106.7346)
|
| 217 |
+
logger.info("✓ Normalization parameters loaded")
|
| 218 |
+
else:
|
| 219 |
+
# Use your actual training values
|
| 220 |
+
self.if_min, self.if_max = -0.2400, 0.1680
|
| 221 |
+
self.svm_min, self.svm_max = -381.6356, 106.7346
|
| 222 |
+
logger.info("Using training normalization parameters")
|
| 223 |
+
|
| 224 |
+
logger.info("All models loaded successfully!")
|
| 225 |
+
|
| 226 |
+
except Exception as e:
|
| 227 |
+
logger.error(f"Error loading models: {e}")
|
| 228 |
+
raise
|
| 229 |
+
|
| 230 |
+
def process_gps_point(self, gps_point: GPSPoint) -> Optional[AnomalyResult]:
|
| 231 |
+
"""
|
| 232 |
+
Process a single GPS point - main entry point for real-time detection
|
| 233 |
+
"""
|
| 234 |
+
vehicle_id = gps_point.vehicle_id
|
| 235 |
+
|
| 236 |
+
# Initialize vehicle buffer if needed
|
| 237 |
+
if vehicle_id not in self.vehicle_buffers:
|
| 238 |
+
self.vehicle_buffers[vehicle_id] = deque(maxlen=self.buffer_size)
|
| 239 |
+
|
| 240 |
+
# Add point to buffer
|
| 241 |
+
self.vehicle_buffers[vehicle_id].append(gps_point)
|
| 242 |
+
buffer = self.vehicle_buffers[vehicle_id]
|
| 243 |
+
|
| 244 |
+
# Need minimum points for detection
|
| 245 |
+
if len(buffer) < self.config['min_points_for_detection']:
|
| 246 |
+
return None
|
| 247 |
+
|
| 248 |
+
try:
|
| 249 |
+
# Convert buffer to DataFrame in your exact format
|
| 250 |
+
buffer_data = []
|
| 251 |
+
for point in buffer:
|
| 252 |
+
buffer_data.append(point.to_dataset_format())
|
| 253 |
+
|
| 254 |
+
df_buffer = pd.DataFrame(buffer_data)
|
| 255 |
+
|
| 256 |
+
# Calculate features using your exact feature engineering pipeline
|
| 257 |
+
features_df = self._calculate_features_exact_pipeline(df_buffer)
|
| 258 |
+
|
| 259 |
+
if len(features_df) == 0:
|
| 260 |
+
return None
|
| 261 |
+
|
| 262 |
+
# Get latest point features
|
| 263 |
+
latest_features = features_df.iloc[-1:].values
|
| 264 |
+
latest_scaled = self.scaler.transform(latest_features)
|
| 265 |
+
|
| 266 |
+
# Get anomaly scores
|
| 267 |
+
scores = self._get_anomaly_scores(features_df, latest_scaled)
|
| 268 |
+
|
| 269 |
+
# Calculate ensemble score
|
| 270 |
+
ensemble_score = self._calculate_ensemble_score(scores)
|
| 271 |
+
|
| 272 |
+
# Determine alert level
|
| 273 |
+
alert_level = self._get_alert_level(ensemble_score)
|
| 274 |
+
|
| 275 |
+
# Extract metrics from the processed features
|
| 276 |
+
latest_processed = features_df.iloc[-1]
|
| 277 |
+
driving_metrics = self._extract_driving_metrics_from_features(latest_processed)
|
| 278 |
+
risk_factors = self._extract_risk_factors_from_features(latest_processed)
|
| 279 |
+
|
| 280 |
+
return AnomalyResult(
|
| 281 |
+
timestamp=gps_point.timestamp or datetime.now().isoformat(),
|
| 282 |
+
vehicle_id=vehicle_id,
|
| 283 |
+
anomaly_detected=ensemble_score > self.config['alert_threshold'],
|
| 284 |
+
confidence=float(ensemble_score),
|
| 285 |
+
alert_level=alert_level,
|
| 286 |
+
raw_scores=scores,
|
| 287 |
+
driving_metrics=driving_metrics,
|
| 288 |
+
risk_factors=risk_factors
|
| 289 |
+
)
|
| 290 |
+
|
| 291 |
+
except Exception as e:
|
| 292 |
+
logger.error(f"Error processing GPS point for vehicle {vehicle_id}: {e}")
|
| 293 |
+
return None
|
| 294 |
+
|
| 295 |
+
def _calculate_features_exact_pipeline(self, df: pd.DataFrame) -> pd.DataFrame:
|
| 296 |
+
"""
|
| 297 |
+
Calculate features using EXACT same pipeline as your training code
|
| 298 |
+
Input: DataFrame with columns [randomized_id, lat, lng, alt, spd, azm]
|
| 299 |
+
Output: DataFrame with 18 features ready for ML models
|
| 300 |
+
"""
|
| 301 |
+
# Apply the EXACT same feature engineering as your training
|
| 302 |
+
df_processed = self._apply_physics_calculations(df.copy())
|
| 303 |
+
df_processed = self._apply_anomaly_feature_engineering(df_processed)
|
| 304 |
+
features_df = self._prepare_ml_features_exact(df_processed)
|
| 305 |
+
|
| 306 |
+
return features_df
|
| 307 |
+
|
| 308 |
+
def _apply_physics_calculations(self, df: pd.DataFrame) -> pd.DataFrame:
|
| 309 |
+
"""Apply exact physics calculations from your training code"""
|
| 310 |
+
|
| 311 |
+
# Sort by trip and create sequence
|
| 312 |
+
df = df.sort_values(['randomized_id', 'lat', 'lng'])
|
| 313 |
+
df['sequence'] = df.groupby('randomized_id').cumcount()
|
| 314 |
+
df['time_delta'] = 1.0 # 1 second intervals
|
| 315 |
+
|
| 316 |
+
def calculate_trip_features(group):
|
| 317 |
+
if len(group) < 3:
|
| 318 |
+
# Fill with safe defaults for short trips
|
| 319 |
+
group['distance'] = 0.0
|
| 320 |
+
group['speed_smooth'] = group['spd']
|
| 321 |
+
group['acceleration'] = 0.0
|
| 322 |
+
group['jerk'] = 0.0
|
| 323 |
+
group['angular_velocity'] = 0.0
|
| 324 |
+
group['lateral_acceleration'] = 0.0
|
| 325 |
+
group['heading_change_rate'] = 0.0
|
| 326 |
+
group['curvature'] = 0.0
|
| 327 |
+
return group
|
| 328 |
+
|
| 329 |
+
# Haversine distance calculation
|
| 330 |
+
def haversine_distance(lat1, lon1, lat2, lon2):
|
| 331 |
+
R = 6371000 # Earth radius in meters
|
| 332 |
+
lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
|
| 333 |
+
dlat = lat2 - lat1
|
| 334 |
+
dlon = lon2 - lon1
|
| 335 |
+
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
|
| 336 |
+
c = 2 * np.arcsin(np.sqrt(np.clip(a, 0, 1)))
|
| 337 |
+
return R * c
|
| 338 |
+
|
| 339 |
+
# Calculate distances
|
| 340 |
+
distances = [0]
|
| 341 |
+
for i in range(1, len(group)):
|
| 342 |
+
try:
|
| 343 |
+
dist = haversine_distance(
|
| 344 |
+
group.iloc[i-1]['lat'], group.iloc[i-1]['lng'],
|
| 345 |
+
group.iloc[i]['lat'], group.iloc[i]['lng']
|
| 346 |
+
)
|
| 347 |
+
dist = min(dist, 1000) # Cap at 1km to avoid GPS errors
|
| 348 |
+
distances.append(dist)
|
| 349 |
+
except:
|
| 350 |
+
distances.append(0)
|
| 351 |
+
|
| 352 |
+
group['distance'] = distances
|
| 353 |
+
|
| 354 |
+
# Smooth speed data
|
| 355 |
+
if len(group) >= 5:
|
| 356 |
+
try:
|
| 357 |
+
group['speed_smooth'] = savgol_filter(group['spd'], 5, 2)
|
| 358 |
+
except:
|
| 359 |
+
group['speed_smooth'] = group['spd']
|
| 360 |
+
else:
|
| 361 |
+
group['speed_smooth'] = group['spd']
|
| 362 |
+
|
| 363 |
+
group['speed_smooth'] = np.maximum(group['speed_smooth'], 0)
|
| 364 |
+
|
| 365 |
+
# Calculate acceleration
|
| 366 |
+
speed_ms = group['speed_smooth'] / 3.6 # km/h to m/s
|
| 367 |
+
try:
|
| 368 |
+
acceleration = np.gradient(speed_ms, group['time_delta'])
|
| 369 |
+
acceleration = np.clip(acceleration, -15, 15)
|
| 370 |
+
except:
|
| 371 |
+
acceleration = np.zeros(len(group))
|
| 372 |
+
group['acceleration'] = acceleration
|
| 373 |
+
|
| 374 |
+
# Calculate jerk
|
| 375 |
+
try:
|
| 376 |
+
jerk = np.gradient(acceleration, group['time_delta'])
|
| 377 |
+
jerk = np.clip(jerk, -20, 20)
|
| 378 |
+
except:
|
| 379 |
+
jerk = np.zeros(len(group))
|
| 380 |
+
group['jerk'] = jerk
|
| 381 |
+
|
| 382 |
+
# Calculate angular velocity
|
| 383 |
+
try:
|
| 384 |
+
azimuth_rad = np.radians(group['azm'])
|
| 385 |
+
azimuth_unwrapped = np.unwrap(azimuth_rad)
|
| 386 |
+
angular_velocity = np.gradient(azimuth_unwrapped, group['time_delta'])
|
| 387 |
+
angular_velocity = np.clip(angular_velocity, -np.pi, np.pi)
|
| 388 |
+
except:
|
| 389 |
+
angular_velocity = np.zeros(len(group))
|
| 390 |
+
group['angular_velocity'] = angular_velocity
|
| 391 |
+
|
| 392 |
+
# Calculate lateral acceleration
|
| 393 |
+
lateral_acceleration = speed_ms * angular_velocity
|
| 394 |
+
lateral_acceleration = np.clip(lateral_acceleration, -20, 20)
|
| 395 |
+
group['lateral_acceleration'] = lateral_acceleration
|
| 396 |
+
|
| 397 |
+
# Calculate heading change rate
|
| 398 |
+
group['heading_change_rate'] = np.abs(angular_velocity)
|
| 399 |
+
|
| 400 |
+
# Calculate curvature with safe division
|
| 401 |
+
denominator = speed_ms + 0.1
|
| 402 |
+
group['curvature'] = np.divide(
|
| 403 |
+
np.abs(angular_velocity),
|
| 404 |
+
denominator,
|
| 405 |
+
out=np.zeros_like(angular_velocity),
|
| 406 |
+
where=denominator!=0
|
| 407 |
+
)
|
| 408 |
+
|
| 409 |
+
return group
|
| 410 |
+
|
| 411 |
+
df = df.groupby('randomized_id').apply(calculate_trip_features)
|
| 412 |
+
df = df.reset_index(drop=True)
|
| 413 |
+
|
| 414 |
+
# Clean any remaining NaN/inf values
|
| 415 |
+
numeric_columns = ['distance', 'speed_smooth', 'acceleration', 'jerk',
|
| 416 |
+
'angular_velocity', 'lateral_acceleration', 'heading_change_rate', 'curvature']
|
| 417 |
+
|
| 418 |
+
for col in numeric_columns:
|
| 419 |
+
if col in df.columns:
|
| 420 |
+
df[col] = df[col].fillna(0)
|
| 421 |
+
df[col] = df[col].replace([np.inf, -np.inf], 0)
|
| 422 |
+
|
| 423 |
+
return df
|
| 424 |
+
|
| 425 |
+
def _apply_anomaly_feature_engineering(self, df: pd.DataFrame) -> pd.DataFrame:
|
| 426 |
+
"""Apply exact anomaly feature engineering from your training code"""
|
| 427 |
+
|
| 428 |
+
# Rolling window statistics
|
| 429 |
+
window_sizes = [3, 5, 10]
|
| 430 |
+
|
| 431 |
+
for window in window_sizes:
|
| 432 |
+
try:
|
| 433 |
+
# Speed patterns
|
| 434 |
+
df[f'speed_std_{window}'] = df.groupby('randomized_id')['spd'].rolling(
|
| 435 |
+
window, center=True, min_periods=1).std().reset_index(0, drop=True).fillna(0)
|
| 436 |
+
df[f'speed_max_{window}'] = df.groupby('randomized_id')['spd'].rolling(
|
| 437 |
+
window, center=True, min_periods=1).max().reset_index(0, drop=True).fillna(0)
|
| 438 |
+
df[f'speed_min_{window}'] = df.groupby('randomized_id')['spd'].rolling(
|
| 439 |
+
window, center=True, min_periods=1).min().reset_index(0, drop=True).fillna(0)
|
| 440 |
+
|
| 441 |
+
# Acceleration patterns
|
| 442 |
+
df[f'accel_std_{window}'] = df.groupby('randomized_id')['acceleration'].rolling(
|
| 443 |
+
window, center=True, min_periods=1).std().reset_index(0, drop=True).fillna(0)
|
| 444 |
+
df[f'accel_max_{window}'] = df.groupby('randomized_id')['acceleration'].rolling(
|
| 445 |
+
window, center=True, min_periods=1).max().reset_index(0, drop=True).fillna(0)
|
| 446 |
+
df[f'accel_min_{window}'] = df.groupby('randomized_id')['acceleration'].rolling(
|
| 447 |
+
window, center=True, min_periods=1).min().reset_index(0, drop=True).fillna(0)
|
| 448 |
+
except:
|
| 449 |
+
# Fallback values
|
| 450 |
+
df[f'speed_std_{window}'] = 0
|
| 451 |
+
df[f'speed_max_{window}'] = df['spd']
|
| 452 |
+
df[f'speed_min_{window}'] = df['spd']
|
| 453 |
+
df[f'accel_std_{window}'] = 0
|
| 454 |
+
df[f'accel_max_{window}'] = df['acceleration']
|
| 455 |
+
df[f'accel_min_{window}'] = df['acceleration']
|
| 456 |
+
|
| 457 |
+
# Extreme behavior indicators (exact thresholds from training)
|
| 458 |
+
df['hard_braking'] = (df['acceleration'] < -4.0).astype(int)
|
| 459 |
+
df['hard_acceleration'] = (df['acceleration'] > 3.0).astype(int)
|
| 460 |
+
df['excessive_speed'] = (df['spd'] > 80).astype(int)
|
| 461 |
+
df['sharp_turn'] = (np.abs(df['lateral_acceleration']) > 4.0).astype(int)
|
| 462 |
+
df['erratic_steering'] = (np.abs(df['heading_change_rate']) > 0.5).astype(int)
|
| 463 |
+
|
| 464 |
+
# Composite risk scores (exact same calculations)
|
| 465 |
+
df['acceleration_risk'] = np.clip(np.abs(df['acceleration']) / 10.0, 0, 1)
|
| 466 |
+
df['jerk_risk'] = np.clip(np.abs(df['jerk']) / 5.0, 0, 1)
|
| 467 |
+
df['lateral_risk'] = np.clip(np.abs(df['lateral_acceleration']) / 8.0, 0, 1)
|
| 468 |
+
df['speed_risk'] = np.clip(np.maximum(0, (df['spd'] - 60) / 40.0), 0, 1)
|
| 469 |
+
|
| 470 |
+
# Overall risk score (exact same weights)
|
| 471 |
+
df['overall_risk'] = (
|
| 472 |
+
df['acceleration_risk'] * 0.25 +
|
| 473 |
+
df['jerk_risk'] * 0.20 +
|
| 474 |
+
df['lateral_risk'] * 0.25 +
|
| 475 |
+
df['speed_risk'] * 0.15 +
|
| 476 |
+
(df['hard_braking'] + df['hard_acceleration'] +
|
| 477 |
+
df['sharp_turn'] + df['erratic_steering']) * 0.15 / 4
|
| 478 |
+
)
|
| 479 |
+
|
| 480 |
+
df['overall_risk'] = np.clip(df['overall_risk'], 0, 1)
|
| 481 |
+
|
| 482 |
+
return df
|
| 483 |
+
|
| 484 |
+
def _prepare_ml_features_exact(self, df: pd.DataFrame) -> pd.DataFrame:
|
| 485 |
+
"""Prepare exact same 18 features as in training"""
|
| 486 |
+
|
| 487 |
+
# Exact same feature columns as your training
|
| 488 |
+
feature_columns = [
|
| 489 |
+
'spd', 'acceleration', 'jerk', 'angular_velocity', 'lateral_acceleration',
|
| 490 |
+
'heading_change_rate', 'curvature', 'overall_risk',
|
| 491 |
+
'speed_std_3', 'speed_std_5', 'speed_std_10',
|
| 492 |
+
'accel_std_3', 'accel_std_5', 'accel_std_10',
|
| 493 |
+
'acceleration_risk', 'jerk_risk', 'lateral_risk', 'speed_risk'
|
| 494 |
+
]
|
| 495 |
+
|
| 496 |
+
features_df = df[feature_columns].copy()
|
| 497 |
+
|
| 498 |
+
# Clean any remaining issues
|
| 499 |
+
for col in feature_columns:
|
| 500 |
+
features_df[col] = features_df[col].fillna(0)
|
| 501 |
+
features_df[col] = features_df[col].replace([np.inf, -np.inf], 0)
|
| 502 |
+
|
| 503 |
+
return features_df
|
| 504 |
+
|
| 505 |
+
def _get_anomaly_scores(self, features_df: pd.DataFrame, latest_scaled: np.ndarray) -> Dict[str, float]:
|
| 506 |
+
"""Get anomaly scores from all models"""
|
| 507 |
+
scores = {}
|
| 508 |
+
|
| 509 |
+
# Isolation Forest
|
| 510 |
+
if self.isolation_forest:
|
| 511 |
+
scores['isolation_forest'] = float(self.isolation_forest.decision_function(latest_scaled)[0])
|
| 512 |
+
|
| 513 |
+
# One-Class SVM
|
| 514 |
+
if self.one_class_svm:
|
| 515 |
+
scores['one_class_svm'] = float(self.one_class_svm.decision_function(latest_scaled)[0])
|
| 516 |
+
|
| 517 |
+
# LSTM Autoencoder
|
| 518 |
+
if self.lstm_autoencoder and len(features_df) >= self.config['lstm_sequence_length']:
|
| 519 |
+
try:
|
| 520 |
+
sequence_length = self.config['lstm_sequence_length']
|
| 521 |
+
sequence_features = features_df.iloc[-sequence_length:].values
|
| 522 |
+
sequence_scaled = self.scaler.transform(sequence_features)
|
| 523 |
+
sequence_tensor = torch.FloatTensor(sequence_scaled).unsqueeze(0).to(device)
|
| 524 |
+
|
| 525 |
+
with torch.no_grad():
|
| 526 |
+
reconstructed = self.lstm_autoencoder(sequence_tensor)
|
| 527 |
+
reconstruction_error = torch.mean((sequence_tensor - reconstructed) ** 2).item()
|
| 528 |
+
scores['lstm'] = float(reconstruction_error)
|
| 529 |
+
except Exception as e:
|
| 530 |
+
logger.warning(f"LSTM inference error: {e}")
|
| 531 |
+
scores['lstm'] = 0.0
|
| 532 |
+
|
| 533 |
+
return scores
|
| 534 |
+
|
| 535 |
+
def _calculate_ensemble_score(self, scores: Dict[str, float]) -> float:
|
| 536 |
+
"""Calculate ensemble score using exact same logic as training"""
|
| 537 |
+
ensemble_score = 0.0
|
| 538 |
+
weights = self.config['weights']
|
| 539 |
+
|
| 540 |
+
# Isolation Forest (lower = more anomalous)
|
| 541 |
+
if 'isolation_forest' in scores:
|
| 542 |
+
if_range = self.if_max - self.if_min
|
| 543 |
+
if if_range > 0:
|
| 544 |
+
if_normalized = (scores['isolation_forest'] - self.if_min) / if_range
|
| 545 |
+
if_anomaly_score = 1.0 - np.clip(if_normalized, 0, 1)
|
| 546 |
+
else:
|
| 547 |
+
if_anomaly_score = 0.5
|
| 548 |
+
ensemble_score += weights['isolation_forest'] * if_anomaly_score
|
| 549 |
+
|
| 550 |
+
# SVM (negative = more anomalous)
|
| 551 |
+
if 'one_class_svm' in scores:
|
| 552 |
+
svm_range = self.svm_max - self.svm_min
|
| 553 |
+
if svm_range > 0:
|
| 554 |
+
svm_normalized = (scores['one_class_svm'] - self.svm_min) / svm_range
|
| 555 |
+
svm_anomaly_score = 1.0 - np.clip(svm_normalized, 0, 1)
|
| 556 |
+
else:
|
| 557 |
+
svm_anomaly_score = 0.5
|
| 558 |
+
ensemble_score += weights['one_class_svm'] * svm_anomaly_score
|
| 559 |
+
|
| 560 |
+
# LSTM (higher reconstruction error = more anomalous)
|
| 561 |
+
if 'lstm' in scores and self.lstm_threshold:
|
| 562 |
+
lstm_anomaly_score = np.clip(scores['lstm'] / self.lstm_threshold, 0, 1)
|
| 563 |
+
ensemble_score += weights['lstm'] * lstm_anomaly_score
|
| 564 |
+
|
| 565 |
+
return np.clip(ensemble_score, 0, 1)
|
| 566 |
+
|
| 567 |
+
def _get_alert_level(self, confidence: float) -> str:
|
| 568 |
+
"""Determine alert level"""
|
| 569 |
+
if confidence > 0.8:
|
| 570 |
+
return 'CRITICAL'
|
| 571 |
+
elif confidence > 0.6:
|
| 572 |
+
return 'HIGH'
|
| 573 |
+
elif confidence > 0.4:
|
| 574 |
+
return 'MEDIUM'
|
| 575 |
+
elif confidence > 0.2:
|
| 576 |
+
return 'LOW'
|
| 577 |
+
else:
|
| 578 |
+
return 'NORMAL'
|
| 579 |
+
|
| 580 |
+
def _extract_driving_metrics_from_features(self, features_row: pd.Series) -> Dict[str, float]:
|
| 581 |
+
"""Extract driving metrics from processed features"""
|
| 582 |
+
return {
|
| 583 |
+
'speed': float(features_row['spd']),
|
| 584 |
+
'acceleration': float(features_row['acceleration']),
|
| 585 |
+
'lateral_acceleration': float(features_row['lateral_acceleration']),
|
| 586 |
+
'jerk': float(features_row['jerk']),
|
| 587 |
+
'heading_change_rate': float(features_row['heading_change_rate']),
|
| 588 |
+
'overall_risk': float(features_row['overall_risk'])
|
| 589 |
+
}
|
| 590 |
+
|
| 591 |
+
def _extract_risk_factors_from_features(self, features_row):
|
| 592 |
+
"""
|
| 593 |
+
Extract boolean risk factors from a row of driving features.
|
| 594 |
+
"""
|
| 595 |
+
|
| 596 |
+
return {
|
| 597 |
+
'hard_braking': bool(features_row['acceleration'] < -2.5), # sudden deceleration
|
| 598 |
+
'hard_acceleration': bool(features_row['acceleration'] > 2.5), # sudden acceleration
|
| 599 |
+
'excessive_speed': bool(features_row['spd'] > 120), # overspeeding (km/h)
|
| 600 |
+
'sharp_turn': bool(abs(features_row['lateral_acceleration']) > 3.0), # strong lateral g-force
|
| 601 |
+
'erratic_steering': bool(abs(features_row['angular_velocity']) > 30) # quick steering angle change
|
| 602 |
+
}
|
| 603 |
+
|
| 604 |
+
def get_vehicle_status(self, vehicle_id: str) -> Dict[str, Any]:
|
| 605 |
+
"""Get current status of a vehicle"""
|
| 606 |
+
if vehicle_id not in self.vehicle_buffers:
|
| 607 |
+
return {'vehicle_id': vehicle_id, 'status': 'no_data'}
|
| 608 |
+
|
| 609 |
+
buffer = self.vehicle_buffers[vehicle_id]
|
| 610 |
+
return {
|
| 611 |
+
'vehicle_id': vehicle_id,
|
| 612 |
+
'buffer_size': len(buffer),
|
| 613 |
+
'last_update': buffer[-1].timestamp if buffer else None,
|
| 614 |
+
'ready_for_detection': len(buffer) >= self.config['min_points_for_detection']
|
| 615 |
+
}
|
| 616 |
+
|
| 617 |
+
# Updated API input model to match your data structure
|
| 618 |
+
from fastapi import FastAPI, HTTPException
|
| 619 |
+
from pydantic import BaseModel
|
| 620 |
+
from typing import Optional
|
| 621 |
+
|
| 622 |
+
class GPSPointRequest(BaseModel):
|
| 623 |
+
"""API request model matching your dataset columns"""
|
| 624 |
+
vehicle_id: str # maps to randomized_id
|
| 625 |
+
lat: float
|
| 626 |
+
lng: float
|
| 627 |
+
alt: float = 0.0
|
| 628 |
+
spd: float # speed in km/h
|
| 629 |
+
azm: float # azimuth/heading 0-360
|
| 630 |
+
timestamp: Optional[str] = None
|
| 631 |
+
|
| 632 |
+
# Updated sample input/output for your exact data structure
|
| 633 |
+
sample_input_output = {
|
| 634 |
+
"input": {
|
| 635 |
+
"vehicle_id": "fleet_001",
|
| 636 |
+
"lat": 55.7558,
|
| 637 |
+
"lng": 37.6176,
|
| 638 |
+
"alt": 156.0,
|
| 639 |
+
"spd": 45.5,
|
| 640 |
+
"azm": 85.0,
|
| 641 |
+
"timestamp": "2025-09-13T10:31:18Z"
|
| 642 |
+
},
|
| 643 |
+
"output": {
|
| 644 |
+
"status": "detected",
|
| 645 |
+
"result": {
|
| 646 |
+
"timestamp": "2025-09-13T10:31:18Z",
|
| 647 |
+
"vehicle_id": "fleet_001",
|
| 648 |
+
"anomaly_detected": False,
|
| 649 |
+
"confidence": 0.156,
|
| 650 |
+
"alert_level": "NORMAL",
|
| 651 |
+
"raw_scores": {
|
| 652 |
+
"isolation_forest": 0.045,
|
| 653 |
+
"one_class_svm": 12.34,
|
| 654 |
+
"lstm": 0.234
|
| 655 |
+
},
|
| 656 |
+
"driving_metrics": {
|
| 657 |
+
"speed": 45.5,
|
| 658 |
+
"acceleration": 0.12,
|
| 659 |
+
"lateral_acceleration": 0.08,
|
| 660 |
+
"jerk": 0.05,
|
| 661 |
+
"heading_change_rate": 0.02,
|
| 662 |
+
"overall_risk": 0.089
|
| 663 |
+
},
|
| 664 |
+
"risk_factors": {
|
| 665 |
+
"hard_braking": False,
|
| 666 |
+
"hard_acceleration": False,
|
| 667 |
+
"excessive_speed": False,
|
| 668 |
+
"sharp_turn": False,
|
| 669 |
+
"erratic_steering": False
|
| 670 |
+
}
|
| 671 |
+
}
|
| 672 |
+
}
|
| 673 |
+
}
|
requirements.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=4.0.0
|
| 2 |
+
pandas>=1.5.0
|
| 3 |
+
numpy>=1.21.0
|
| 4 |
+
torch>=1.12.0
|
| 5 |
+
scikit-learn>=1.1.0
|
| 6 |
+
plotly>=5.0.0
|
| 7 |
+
scipy>=1.9.0
|
| 8 |
+
joblib>=1.2.0
|
| 9 |
+
aiofiles>=22.0.0
|
sample_data.csv
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
randomized_id,lat,lng,spd,azm,alt
|
| 2 |
+
VEHICLE001,40.7128,-74.0060,45.5,90.0,100.0
|
| 3 |
+
VEHICLE001,40.7138,-74.0070,48.2,92.0,102.0
|
| 4 |
+
VEHICLE001,40.7148,-74.0080,52.1,95.0,105.0
|
| 5 |
+
VEHICLE001,40.7158,-74.0090,85.3,98.0,108.0
|
| 6 |
+
VEHICLE001,40.7168,-74.0100,127.5,101.0,110.0
|
| 7 |
+
VEHICLE001,40.7178,-74.0110,156.2,105.0,112.0
|
| 8 |
+
VEHICLE001,40.7188,-74.0120,42.8,108.0,115.0
|
| 9 |
+
VEHICLE001,40.7198,-74.0130,38.5,110.0,118.0
|
| 10 |
+
VEHICLE002,40.7500,-73.9800,35.2,180.0,90.0
|
| 11 |
+
VEHICLE002,40.7510,-73.9810,38.1,182.0,92.0
|
| 12 |
+
VEHICLE002,40.7520,-73.9820,41.5,185.0,95.0
|
| 13 |
+
VEHICLE002,40.7530,-73.9830,165.8,188.0,98.0
|
| 14 |
+
VEHICLE002,40.7540,-73.9840,198.2,191.0,100.0
|
| 15 |
+
VEHICLE002,40.7550,-73.9850,43.7,195.0,102.0
|
| 16 |
+
VEHICLE002,40.7560,-73.9860,39.9,198.0,105.0
|
| 17 |
+
VEHICLE003,40.8000,-73.9500,55.0,270.0,200.0
|
| 18 |
+
VEHICLE003,40.8010,-73.9510,58.3,272.0,202.0
|
| 19 |
+
VEHICLE003,40.8020,-73.9520,62.1,275.0,205.0
|
| 20 |
+
VEHICLE003,40.8030,-73.9530,220.5,278.0,208.0
|
| 21 |
+
VEHICLE003,40.8040,-73.9540,245.8,281.0,210.0
|
| 22 |
+
VEHICLE003,40.8050,-73.9550,51.2,285.0,212.0
|
| 23 |
+
VEHICLE003,40.8060,-73.9560,48.7,288.0,215.0
|
| 24 |
+
VEHICLE004,40.6500,-74.1000,25.0,45.0,50.0
|
| 25 |
+
VEHICLE004,40.6510,-74.1010,28.5,47.0,52.0
|
| 26 |
+
VEHICLE004,40.6520,-74.1020,31.2,49.0,55.0
|
| 27 |
+
VEHICLE004,40.6530,-74.1030,34.8,52.0,58.0
|
| 28 |
+
VEHICLE004,40.6540,-74.1040,37.5,55.0,60.0
|
| 29 |
+
VEHICLE004,40.6550,-74.1050,40.1,58.0,62.0
|
| 30 |
+
VEHICLE004,40.6560,-74.1060,42.8,60.0,65.0
|
| 31 |
+
VEHICLE004,40.6570,-74.1070,45.5,62.0,68.0
|