UIDAI / README.md
LovnishVerma's picture
Update README.md
47e0648 verified
|
raw
history blame
11.8 kB
---
title: UIDAI Project Sentinel
emoji: πŸš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Data-Driven Innovation for Aadhaar
---
# πŸ›‘οΈ Project Sentinel: AI-Powered Fraud Detection for UIDAI
[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://huggingface.co/spaces/lovnishverma/UIDAI)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
> **Context-Aware Anomaly Detection System for Aadhaar Enrolment Centers**
> Team ID: UIDAI_4571 | Theme: Data-Driven Innovation for Aadhaar
---
## 🎯 Quick Links
- **πŸ“Š Live Notebook**: [Open in Google Colab](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
- **πŸš€ Dashboard Demo**: [Hugging Face Spaces](https://huggingface.co/spaces/lovnishverma/UIDAI)
- **πŸ“– Documentation**: See `/docs` folder
- **πŸ’» Source Code**: Available in this repository
---
## 🎯 Overview
Project Sentinel is an innovative fraud detection system designed specifically for UIDAI Aadhaar enrolment centers. Unlike traditional global threshold-based systems, Sentinel uses **context-aware machine learning** with district-level normalization to identify fraudulent patterns while accounting for India's demographic diversity.
### The Problem We Solve
India's demographic diversity creates a unique challenge:
- πŸ“Š Activities normal in Mumbai may be suspicious in tribal villages (and vice versa)
- βš–οΈ Global thresholds either miss frauds or create false positives
- 🎯 Need: Regional baselines that adapt to local patterns
### Our Innovation
**District Normalization**: Each enrolment center is compared to its local district baseline, not a national average.
**Example**: In a tribal district with 40% adult enrolment average, a center with 90% adult ratio gets flagged for deviationβ€”even if absolute numbers are lower than urban centers.
---
## ✨ Key Features
### πŸ€– Machine Learning Engine
- **Algorithm**: Isolation Forest (Unsupervised Learning)
- **Core Innovation**: Context-aware features with district baselines
- **Detection**: Ghost IDs, weekend fraud, data manipulation, coordinated operations
### πŸ“Š Interactive Dashboard
- **Real-time KPIs**: 6 comprehensive metrics with trend indicators
- **Geographic Heatmap**: Risk visualization across India
- **Pattern Analysis**: Scatter plots, histograms, time series
- **Advanced Analytics**: Feature importance, correlation matrix, performance gauges
### πŸ” Smart Filtering
- Date range selection for temporal analysis
- Multi-select risk categories (Low/Medium/High/Critical)
- Dynamic state β†’ district cascading
- Weekend-only anomaly toggle
### πŸ“₯ Multiple Export Formats
- **CSV**: Field team verification lists
- **JSON**: API integration
- **TXT**: Investigation reports for management
---
## πŸš€ Quick Start
### **Option 1: Google Colab (Fastest)**
Run the complete analysis in your browser without any setup:
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
Click the badge above to open the notebook and run all cells to generate the analyzed data.
### **Option 2: Local Setup**
### Prerequisites
```bash
Python 3.8+
pip (Python package manager)
```
### Installation
1. **Clone the repository**
```bash
git clone https://huggingface.co/spaces/lovnishverma/UIDAI
cd UIDAI
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Run the Jupyter Notebook** (Data Processing)
```bash
jupyter notebook project_sentinel_notebook.ipynb
```
This generates `analyzed_aadhaar_data.csv`
4. **Launch the Dashboard**
```bash
streamlit run sentinel_dashboard_enhanced.py
```
5. **Access the application**
```
http://localhost:8501
```
---
## πŸ“ Project Structure
```
UIDAI/
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ Dockerfile # Docker configuration
β”œβ”€β”€ project_sentinel_notebook.ipynb # ML model & data processing
β”œβ”€β”€ app.py # Streamlit dashboard
β”œβ”€β”€ analyzed_aadhaar_data.csv # Processed data (generated from colab)
β”œβ”€β”€ docs/
β”‚ β”œβ”€β”€ Project_Sentinel_Analysis.docx
β”‚ β”œβ”€β”€ Sentinel_Dashboard_Documentation.docx
β”‚ └── Dashboard_Enhancements_Guide.docx
└── assets/
└── screenshots/ # Dashboard screenshots
```
---
## 🧠 Technical Architecture
### Data Pipeline
```
Raw Data (Biometric + Demographic + Enrolment)
↓
SmartLoader (Chunked CSV ingestion)
↓
Master Merge (Outer joins on date/state/district/pincode)
↓
ContextEngine (District normalization)
↓
Feature Engineering (4 context-aware features)
↓
Isolation Forest (Anomaly detection)
↓
Risk Scoring (0-100 scale)
↓
Dashboard Visualization
```
### Core Features (ML Model)
| Feature | Description | Importance |
|---------|-------------|------------|
| **ratio_deviation** | Deviation from district avg adult ratio | 45% |
| **weekend_spike_score** | Activity spike on weekends/holidays | 25% |
| **mismatch_score** | Discrepancy between bio/demo updates | 20% |
| **total_activity** | Overall transaction volume | 10% |
### Technology Stack
- **Backend**: Python 3.8+, Pandas, NumPy, Scikit-learn
- **ML**: Isolation Forest (Unsupervised Anomaly Detection)
- **Frontend**: Streamlit (Web Framework)
- **Visualization**: Plotly Express, Plotly Graph Objects
- **Deployment**: Docker, Hugging Face Spaces
---
## πŸ“Š Dashboard Overview
### Tab 1: Geographic Analysis
- **Interactive Map**: Risk heatmap with circle size = volume, color = risk
- **Top 5 Hotspots**: Color-coded cards showing riskiest locations
- **Risk Distribution**: Donut chart breakdown by category
### Tab 2: Pattern Analysis
- **Ghost ID Indicator**: Scatter plot with deviation thresholds
- **Risk Histogram**: Distribution concentration analysis
- **Time Series**: Dual-axis chart showing trends over time
- **Statistics**: Mean, median, std dev, 95th percentile
### Tab 3: Priority Cases
- **Adjustable Threshold**: Slider to filter by minimum risk score
- **Action Status**: Workflow tracking (Pending/Investigation/Resolved)
- **Enhanced Table**: Progress bars, formatted columns
- **Export Options**: CSV, JSON, TXT formats
### Tab 4: Advanced Analytics
- **Feature Importance**: Bar chart showing ML contributions
- **Performance Gauge**: Speedometer-style model accuracy
- **Correlation Heatmap**: Feature relationship matrix
- **Key Insights**: Contextual intelligence cards
---
## 🎨 Visual Design
### Professional Styling
- **Gradients**: Purple/blue for government portal aesthetic
- **Animations**: Pulsing alerts for critical cases
- **Typography**: Google Fonts (Inter) for modern look
- **Color Coding**: Risk levels with emoji indicators (πŸ”΄πŸŸ πŸŸ‘πŸŸ’)
### Responsive Layout
- **Wide Mode**: Maximum data density
- **Tabbed Interface**: Organized content reduces cognitive load
- **Adaptive Visualizations**: Charts adjust to filter context
---
## πŸ”§ Configuration
### Model Parameters
```python
Config.ML_FEATURES = [
'ratio_deviation', # Primary fraud indicator
'weekend_spike_score', # Unauthorized operations
'mismatch_score', # Data manipulation
'total_activity' # Volume context
]
Config.CONTAMINATION = 0.05 # 5% expected anomaly rate
Config.RANDOM_STATE = 42 # Reproducibility
```
### Risk Thresholds
```python
RISK_CATEGORIES = {
'Low': [0, 50],
'Medium': [50, 70],
'High': [70, 85],
'Critical': [85, 100]
}
```
---
## πŸ“ˆ Use Cases
### 1. Ghost Identity Creation
**Pattern**: Abnormally high adult enrolment ratio
**Detection**: High positive ratio_deviation
**Example**: District avg 40%, center reports 90% β†’ FLAGGED
### 2. Weekend/Holiday Fraud
**Pattern**: Activity spikes when centers should be closed
**Detection**: High weekend_spike_score
**Example**: 5x normal activity on Sunday β†’ FLAGGED
### 3. Data Manipulation
**Pattern**: Discrepancies between biometric and demographic updates
**Detection**: High mismatch_score
**Example**: 100 demo updates, 20 bio updates β†’ FLAGGED
---
## 🚒 Deployment
### Docker Deployment
```bash
# Build image
docker build -t sentinel-dashboard .
# Run container
docker run -p 8501:8501 sentinel-dashboard
```
### Hugging Face Spaces
The app is automatically deployed when you push to the main branch.
### Environment Variables
```bash
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
STREAMLIT_SERVER_HEADLESS=true
```
---
## πŸ“Š Performance Metrics
### Model Performance (Simulated)
- **Precision**: 89%
- **Recall**: 85%
- **F1-Score**: 87%
- **Accuracy**: 88%
### System Performance
- **Data Points Processed**: 500K+ records
- **Processing Time**: <1 second (cached)
- **Dashboard Load Time**: ~2 seconds
- **Visualization Rendering**: <500ms per chart
---
## πŸ”’ Security Considerations
### Current Implementation
- βœ… Data caching for performance
- βœ… Input validation on filters
- βœ… Error handling for missing data
- ⚠️ Simulated coordinates (demo only)
### Production Requirements
- πŸ” SSO/OAuth authentication
- πŸ” Role-based access control (RBAC)
- πŸ” Audit logging for all actions
- πŸ” Data encryption (at rest & in transit)
- πŸ” Real geocoding with pincode master DB
---
## 🎯 Future Enhancements
### Short-term (1-3 months)
- [ ] Real geocoding integration
- [ ] SHAP values for explainability
- [ ] Feedback loop for model refinement
- [ ] PDF report generation
- [ ] Email/SMS alert system
### Long-term (3-6 months)
- [ ] Multi-level baselines (state, district, pincode)
- [ ] Network analysis for coordinated fraud
- [ ] Real-time streaming pipeline (Kafka)
- [ ] Ensemble methods (LOF + One-Class SVM)
- [ ] Mobile app for field officers
---
## πŸ‘₯ Team
**Team ID**: UIDAI_4571
**Theme**: Data-Driven Innovation for Aadhaar
**Competition**: UIDAI Hackathon 2026
---
## πŸ“„ Documentation
Comprehensive documentation available in `/docs`:
- **Project_Sentinel_Analysis.docx**: Technical analysis & code review
- **Sentinel_Dashboard_Documentation.docx**: Dashboard user guide
- **Dashboard_Enhancements_Guide.docx**: Enhancement details
---
## 🀝 Contributing
We welcome contributions! Please follow these steps:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
---
## πŸ“ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
## πŸ™ Acknowledgments
- **UIDAI** for the hackathon opportunity and dataset
- **Anthropic** for AI assistance in development
- **Streamlit** for the amazing web framework
- **Plotly** for interactive visualizations
---
## πŸ“§ Contact
For questions or support, please contact:
- **Email**: sentinel-support@example.com
- **Issues**: [GitHub Issues](https://github.com/lovnnishverma/UIDAI/issues)
- **Discussions**: [GitHub Discussions](https://github.com/lovnishverma/UIDAI/discussions)
---
## 🌟 Star History
If you find this project useful, please consider giving it a ⭐!
---
<div align="center">
<strong>Built with ❀️ for a safer Aadhaar ecosystem</strong>
<br>
<sub>Β© 2026 Project Sentinel. All rights reserved.</sub>
</div>