Spaces:
Sleeping
Sleeping
File size: 11,792 Bytes
26fc2f2 3ba3633 26fc2f2 1814306 26fc2f2 3ba3633 26fc2f2 3ba3633 26fc2f2 5f909d5 3ba3633 26fc2f2 3ba3633 47e0648 3ba3633 47e0648 3ba3633 47e0648 3ba3633 47e0648 3ba3633 47e0648 3ba3633 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 |
---
title: UIDAI Project Sentinel
emoji: π
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Data-Driven Innovation for Aadhaar
---
# π‘οΈ Project Sentinel: AI-Powered Fraud Detection for UIDAI
[](https://huggingface.co/spaces/lovnishverma/UIDAI)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
> **Context-Aware Anomaly Detection System for Aadhaar Enrolment Centers**
> Team ID: UIDAI_4571 | Theme: Data-Driven Innovation for Aadhaar
---
## π― Quick Links
- **π Live Notebook**: [Open in Google Colab](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
- **π Dashboard Demo**: [Hugging Face Spaces](https://huggingface.co/spaces/lovnishverma/UIDAI)
- **π Documentation**: See `/docs` folder
- **π» Source Code**: Available in this repository
---
## π― Overview
Project Sentinel is an innovative fraud detection system designed specifically for UIDAI Aadhaar enrolment centers. Unlike traditional global threshold-based systems, Sentinel uses **context-aware machine learning** with district-level normalization to identify fraudulent patterns while accounting for India's demographic diversity.
### The Problem We Solve
India's demographic diversity creates a unique challenge:
- π Activities normal in Mumbai may be suspicious in tribal villages (and vice versa)
- βοΈ Global thresholds either miss frauds or create false positives
- π― Need: Regional baselines that adapt to local patterns
### Our Innovation
**District Normalization**: Each enrolment center is compared to its local district baseline, not a national average.
**Example**: In a tribal district with 40% adult enrolment average, a center with 90% adult ratio gets flagged for deviationβeven if absolute numbers are lower than urban centers.
---
## β¨ Key Features
### π€ Machine Learning Engine
- **Algorithm**: Isolation Forest (Unsupervised Learning)
- **Core Innovation**: Context-aware features with district baselines
- **Detection**: Ghost IDs, weekend fraud, data manipulation, coordinated operations
### π Interactive Dashboard
- **Real-time KPIs**: 6 comprehensive metrics with trend indicators
- **Geographic Heatmap**: Risk visualization across India
- **Pattern Analysis**: Scatter plots, histograms, time series
- **Advanced Analytics**: Feature importance, correlation matrix, performance gauges
### π Smart Filtering
- Date range selection for temporal analysis
- Multi-select risk categories (Low/Medium/High/Critical)
- Dynamic state β district cascading
- Weekend-only anomaly toggle
### π₯ Multiple Export Formats
- **CSV**: Field team verification lists
- **JSON**: API integration
- **TXT**: Investigation reports for management
---
## π Quick Start
### **Option 1: Google Colab (Fastest)**
Run the complete analysis in your browser without any setup:
[](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
Click the badge above to open the notebook and run all cells to generate the analyzed data.
### **Option 2: Local Setup**
### Prerequisites
```bash
Python 3.8+
pip (Python package manager)
```
### Installation
1. **Clone the repository**
```bash
git clone https://huggingface.co/spaces/lovnishverma/UIDAI
cd UIDAI
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Run the Jupyter Notebook** (Data Processing)
```bash
jupyter notebook project_sentinel_notebook.ipynb
```
This generates `analyzed_aadhaar_data.csv`
4. **Launch the Dashboard**
```bash
streamlit run sentinel_dashboard_enhanced.py
```
5. **Access the application**
```
http://localhost:8501
```
---
## π Project Structure
```
UIDAI/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker configuration
βββ project_sentinel_notebook.ipynb # ML model & data processing
βββ app.py # Streamlit dashboard
βββ analyzed_aadhaar_data.csv # Processed data (generated from colab)
βββ docs/
β βββ Project_Sentinel_Analysis.docx
β βββ Sentinel_Dashboard_Documentation.docx
β βββ Dashboard_Enhancements_Guide.docx
βββ assets/
βββ screenshots/ # Dashboard screenshots
```
---
## π§ Technical Architecture
### Data Pipeline
```
Raw Data (Biometric + Demographic + Enrolment)
β
SmartLoader (Chunked CSV ingestion)
β
Master Merge (Outer joins on date/state/district/pincode)
β
ContextEngine (District normalization)
β
Feature Engineering (4 context-aware features)
β
Isolation Forest (Anomaly detection)
β
Risk Scoring (0-100 scale)
β
Dashboard Visualization
```
### Core Features (ML Model)
| Feature | Description | Importance |
|---------|-------------|------------|
| **ratio_deviation** | Deviation from district avg adult ratio | 45% |
| **weekend_spike_score** | Activity spike on weekends/holidays | 25% |
| **mismatch_score** | Discrepancy between bio/demo updates | 20% |
| **total_activity** | Overall transaction volume | 10% |
### Technology Stack
- **Backend**: Python 3.8+, Pandas, NumPy, Scikit-learn
- **ML**: Isolation Forest (Unsupervised Anomaly Detection)
- **Frontend**: Streamlit (Web Framework)
- **Visualization**: Plotly Express, Plotly Graph Objects
- **Deployment**: Docker, Hugging Face Spaces
---
## π Dashboard Overview
### Tab 1: Geographic Analysis
- **Interactive Map**: Risk heatmap with circle size = volume, color = risk
- **Top 5 Hotspots**: Color-coded cards showing riskiest locations
- **Risk Distribution**: Donut chart breakdown by category
### Tab 2: Pattern Analysis
- **Ghost ID Indicator**: Scatter plot with deviation thresholds
- **Risk Histogram**: Distribution concentration analysis
- **Time Series**: Dual-axis chart showing trends over time
- **Statistics**: Mean, median, std dev, 95th percentile
### Tab 3: Priority Cases
- **Adjustable Threshold**: Slider to filter by minimum risk score
- **Action Status**: Workflow tracking (Pending/Investigation/Resolved)
- **Enhanced Table**: Progress bars, formatted columns
- **Export Options**: CSV, JSON, TXT formats
### Tab 4: Advanced Analytics
- **Feature Importance**: Bar chart showing ML contributions
- **Performance Gauge**: Speedometer-style model accuracy
- **Correlation Heatmap**: Feature relationship matrix
- **Key Insights**: Contextual intelligence cards
---
## π¨ Visual Design
### Professional Styling
- **Gradients**: Purple/blue for government portal aesthetic
- **Animations**: Pulsing alerts for critical cases
- **Typography**: Google Fonts (Inter) for modern look
- **Color Coding**: Risk levels with emoji indicators (π΄π π‘π’)
### Responsive Layout
- **Wide Mode**: Maximum data density
- **Tabbed Interface**: Organized content reduces cognitive load
- **Adaptive Visualizations**: Charts adjust to filter context
---
## π§ Configuration
### Model Parameters
```python
Config.ML_FEATURES = [
'ratio_deviation', # Primary fraud indicator
'weekend_spike_score', # Unauthorized operations
'mismatch_score', # Data manipulation
'total_activity' # Volume context
]
Config.CONTAMINATION = 0.05 # 5% expected anomaly rate
Config.RANDOM_STATE = 42 # Reproducibility
```
### Risk Thresholds
```python
RISK_CATEGORIES = {
'Low': [0, 50],
'Medium': [50, 70],
'High': [70, 85],
'Critical': [85, 100]
}
```
---
## π Use Cases
### 1. Ghost Identity Creation
**Pattern**: Abnormally high adult enrolment ratio
**Detection**: High positive ratio_deviation
**Example**: District avg 40%, center reports 90% β FLAGGED
### 2. Weekend/Holiday Fraud
**Pattern**: Activity spikes when centers should be closed
**Detection**: High weekend_spike_score
**Example**: 5x normal activity on Sunday β FLAGGED
### 3. Data Manipulation
**Pattern**: Discrepancies between biometric and demographic updates
**Detection**: High mismatch_score
**Example**: 100 demo updates, 20 bio updates β FLAGGED
---
## π’ Deployment
### Docker Deployment
```bash
# Build image
docker build -t sentinel-dashboard .
# Run container
docker run -p 8501:8501 sentinel-dashboard
```
### Hugging Face Spaces
The app is automatically deployed when you push to the main branch.
### Environment Variables
```bash
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
STREAMLIT_SERVER_HEADLESS=true
```
---
## π Performance Metrics
### Model Performance (Simulated)
- **Precision**: 89%
- **Recall**: 85%
- **F1-Score**: 87%
- **Accuracy**: 88%
### System Performance
- **Data Points Processed**: 500K+ records
- **Processing Time**: <1 second (cached)
- **Dashboard Load Time**: ~2 seconds
- **Visualization Rendering**: <500ms per chart
---
## π Security Considerations
### Current Implementation
- β
Data caching for performance
- β
Input validation on filters
- β
Error handling for missing data
- β οΈ Simulated coordinates (demo only)
### Production Requirements
- π SSO/OAuth authentication
- π Role-based access control (RBAC)
- π Audit logging for all actions
- π Data encryption (at rest & in transit)
- π Real geocoding with pincode master DB
---
## π― Future Enhancements
### Short-term (1-3 months)
- [ ] Real geocoding integration
- [ ] SHAP values for explainability
- [ ] Feedback loop for model refinement
- [ ] PDF report generation
- [ ] Email/SMS alert system
### Long-term (3-6 months)
- [ ] Multi-level baselines (state, district, pincode)
- [ ] Network analysis for coordinated fraud
- [ ] Real-time streaming pipeline (Kafka)
- [ ] Ensemble methods (LOF + One-Class SVM)
- [ ] Mobile app for field officers
---
## π₯ Team
**Team ID**: UIDAI_4571
**Theme**: Data-Driven Innovation for Aadhaar
**Competition**: UIDAI Hackathon 2026
---
## π Documentation
Comprehensive documentation available in `/docs`:
- **Project_Sentinel_Analysis.docx**: Technical analysis & code review
- **Sentinel_Dashboard_Documentation.docx**: Dashboard user guide
- **Dashboard_Enhancements_Guide.docx**: Enhancement details
---
## π€ Contributing
We welcome contributions! Please follow these steps:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
---
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## π Acknowledgments
- **UIDAI** for the hackathon opportunity and dataset
- **Anthropic** for AI assistance in development
- **Streamlit** for the amazing web framework
- **Plotly** for interactive visualizations
---
## π§ Contact
For questions or support, please contact:
- **Email**: sentinel-support@example.com
- **Issues**: [GitHub Issues](https://github.com/lovnnishverma/UIDAI/issues)
- **Discussions**: [GitHub Discussions](https://github.com/lovnishverma/UIDAI/discussions)
---
## π Star History
If you find this project useful, please consider giving it a β!
---
<div align="center">
<strong>Built with β€οΈ for a safer Aadhaar ecosystem</strong>
<br>
<sub>Β© 2026 Project Sentinel. All rights reserved.</sub>
</div> |