Spaces:

LovnishVerma
/

UIDAI

Sleeping

File size: 11,792 Bytes

---
title: UIDAI Project Sentinel
emoji: 🚀
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Data-Driven Innovation for Aadhaar
---

# 🛡️ Project Sentinel: AI-Powered Fraud Detection for UIDAI

[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://huggingface.co/spaces/lovnishverma/UIDAI)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

> **Context-Aware Anomaly Detection System for Aadhaar Enrolment Centers**  
> Team ID: UIDAI_4571 | Theme: Data-Driven Innovation for Aadhaar

---

## 🎯 Quick Links

- **📊 Live Notebook**: [Open in Google Colab](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
- **🚀 Dashboard Demo**: [Hugging Face Spaces](https://huggingface.co/spaces/lovnishverma/UIDAI)
- **📖 Documentation**: See `/docs` folder
- **💻 Source Code**: Available in this repository

---

## 🎯 Overview

Project Sentinel is an innovative fraud detection system designed specifically for UIDAI Aadhaar enrolment centers. Unlike traditional global threshold-based systems, Sentinel uses **context-aware machine learning** with district-level normalization to identify fraudulent patterns while accounting for India's demographic diversity.

### The Problem We Solve

India's demographic diversity creates a unique challenge:
- 📊 Activities normal in Mumbai may be suspicious in tribal villages (and vice versa)
- ⚖️ Global thresholds either miss frauds or create false positives
- 🎯 Need: Regional baselines that adapt to local patterns

### Our Innovation

**District Normalization**: Each enrolment center is compared to its local district baseline, not a national average.

**Example**: In a tribal district with 40% adult enrolment average, a center with 90% adult ratio gets flagged for deviation—even if absolute numbers are lower than urban centers.

---

## ✨ Key Features

### 🤖 Machine Learning Engine
- **Algorithm**: Isolation Forest (Unsupervised Learning)
- **Core Innovation**: Context-aware features with district baselines
- **Detection**: Ghost IDs, weekend fraud, data manipulation, coordinated operations

### 📊 Interactive Dashboard
- **Real-time KPIs**: 6 comprehensive metrics with trend indicators
- **Geographic Heatmap**: Risk visualization across India
- **Pattern Analysis**: Scatter plots, histograms, time series
- **Advanced Analytics**: Feature importance, correlation matrix, performance gauges

### 🔍 Smart Filtering
- Date range selection for temporal analysis
- Multi-select risk categories (Low/Medium/High/Critical)
- Dynamic state → district cascading
- Weekend-only anomaly toggle

### 📥 Multiple Export Formats
- **CSV**: Field team verification lists
- **JSON**: API integration
- **TXT**: Investigation reports for management

---

## 🚀 Quick Start

### **Option 1: Google Colab (Fastest)**
Run the complete analysis in your browser without any setup:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)

Click the badge above to open the notebook and run all cells to generate the analyzed data.

### **Option 2: Local Setup**

### Prerequisites
```bash
Python 3.8+
pip (Python package manager)
```

### Installation

1. **Clone the repository**
```bash
git clone https://huggingface.co/spaces/lovnishverma/UIDAI
cd UIDAI
```

2. **Install dependencies**
```bash
pip install -r requirements.txt
```

3. **Run the Jupyter Notebook** (Data Processing)
```bash
jupyter notebook project_sentinel_notebook.ipynb
```
This generates `analyzed_aadhaar_data.csv`

4. **Launch the Dashboard**
```bash
streamlit run sentinel_dashboard_enhanced.py
```

5. **Access the application**
```
http://localhost:8501
```

---

## 📁 Project Structure

```
UIDAI/
├── README.md                          # This file
├── requirements.txt                   # Python dependencies
├── Dockerfile                         # Docker configuration
├── project_sentinel_notebook.ipynb    # ML model & data processing
├── app.py                             # Streamlit dashboard
├── analyzed_aadhaar_data.csv          # Processed data (generated from colab)
├── docs/
│   ├── Project_Sentinel_Analysis.docx
│   ├── Sentinel_Dashboard_Documentation.docx
│   └── Dashboard_Enhancements_Guide.docx
└── assets/
    └── screenshots/                   # Dashboard screenshots
```

---

## 🧠 Technical Architecture

### Data Pipeline
```
Raw Data (Biometric + Demographic + Enrolment)
    ↓
SmartLoader (Chunked CSV ingestion)
    ↓
Master Merge (Outer joins on date/state/district/pincode)
    ↓
ContextEngine (District normalization)
    ↓
Feature Engineering (4 context-aware features)
    ↓
Isolation Forest (Anomaly detection)
    ↓
Risk Scoring (0-100 scale)
    ↓
Dashboard Visualization
```

### Core Features (ML Model)

| Feature | Description | Importance |
|---------|-------------|------------|
| **ratio_deviation** | Deviation from district avg adult ratio | 45% |
| **weekend_spike_score** | Activity spike on weekends/holidays | 25% |
| **mismatch_score** | Discrepancy between bio/demo updates | 20% |
| **total_activity** | Overall transaction volume | 10% |

### Technology Stack

- **Backend**: Python 3.8+, Pandas, NumPy, Scikit-learn
- **ML**: Isolation Forest (Unsupervised Anomaly Detection)
- **Frontend**: Streamlit (Web Framework)
- **Visualization**: Plotly Express, Plotly Graph Objects
- **Deployment**: Docker, Hugging Face Spaces

---

## 📊 Dashboard Overview

### Tab 1: Geographic Analysis
- **Interactive Map**: Risk heatmap with circle size = volume, color = risk
- **Top 5 Hotspots**: Color-coded cards showing riskiest locations
- **Risk Distribution**: Donut chart breakdown by category

### Tab 2: Pattern Analysis
- **Ghost ID Indicator**: Scatter plot with deviation thresholds
- **Risk Histogram**: Distribution concentration analysis
- **Time Series**: Dual-axis chart showing trends over time
- **Statistics**: Mean, median, std dev, 95th percentile

### Tab 3: Priority Cases
- **Adjustable Threshold**: Slider to filter by minimum risk score
- **Action Status**: Workflow tracking (Pending/Investigation/Resolved)
- **Enhanced Table**: Progress bars, formatted columns
- **Export Options**: CSV, JSON, TXT formats

### Tab 4: Advanced Analytics
- **Feature Importance**: Bar chart showing ML contributions
- **Performance Gauge**: Speedometer-style model accuracy
- **Correlation Heatmap**: Feature relationship matrix
- **Key Insights**: Contextual intelligence cards

---

## 🎨 Visual Design

### Professional Styling
- **Gradients**: Purple/blue for government portal aesthetic
- **Animations**: Pulsing alerts for critical cases
- **Typography**: Google Fonts (Inter) for modern look
- **Color Coding**: Risk levels with emoji indicators (🔴🟠🟡🟢)

### Responsive Layout
- **Wide Mode**: Maximum data density
- **Tabbed Interface**: Organized content reduces cognitive load
- **Adaptive Visualizations**: Charts adjust to filter context

---

## 🔧 Configuration

### Model Parameters
```python
Config.ML_FEATURES = [
    'ratio_deviation',      # Primary fraud indicator
    'weekend_spike_score',  # Unauthorized operations
    'mismatch_score',       # Data manipulation
    'total_activity'        # Volume context
]
Config.CONTAMINATION = 0.05  # 5% expected anomaly rate
Config.RANDOM_STATE = 42     # Reproducibility
```

### Risk Thresholds
```python
RISK_CATEGORIES = {
    'Low': [0, 50],
    'Medium': [50, 70],
    'High': [70, 85],
    'Critical': [85, 100]
}
```

---

## 📈 Use Cases

### 1. Ghost Identity Creation
**Pattern**: Abnormally high adult enrolment ratio  
**Detection**: High positive ratio_deviation  
**Example**: District avg 40%, center reports 90% → FLAGGED

### 2. Weekend/Holiday Fraud
**Pattern**: Activity spikes when centers should be closed  
**Detection**: High weekend_spike_score  
**Example**: 5x normal activity on Sunday → FLAGGED

### 3. Data Manipulation
**Pattern**: Discrepancies between biometric and demographic updates  
**Detection**: High mismatch_score  
**Example**: 100 demo updates, 20 bio updates → FLAGGED

---

## 🚢 Deployment

### Docker Deployment
```bash
# Build image
docker build -t sentinel-dashboard .

# Run container
docker run -p 8501:8501 sentinel-dashboard
```

### Hugging Face Spaces
The app is automatically deployed when you push to the main branch.

### Environment Variables
```bash
STREAMLIT_SERVER_PORT=8501
STREAMLIT_SERVER_ADDRESS=0.0.0.0
STREAMLIT_SERVER_HEADLESS=true
```

---

## 📊 Performance Metrics

### Model Performance (Simulated)
- **Precision**: 89%
- **Recall**: 85%
- **F1-Score**: 87%
- **Accuracy**: 88%

### System Performance
- **Data Points Processed**: 500K+ records
- **Processing Time**: <1 second (cached)
- **Dashboard Load Time**: ~2 seconds
- **Visualization Rendering**: <500ms per chart

---

## 🔒 Security Considerations

### Current Implementation
- ✅ Data caching for performance
- ✅ Input validation on filters
- ✅ Error handling for missing data
- ⚠️ Simulated coordinates (demo only)

### Production Requirements
- 🔐 SSO/OAuth authentication
- 🔐 Role-based access control (RBAC)
- 🔐 Audit logging for all actions
- 🔐 Data encryption (at rest & in transit)
- 🔐 Real geocoding with pincode master DB

---

## 🎯 Future Enhancements

### Short-term (1-3 months)
- [ ] Real geocoding integration
- [ ] SHAP values for explainability
- [ ] Feedback loop for model refinement
- [ ] PDF report generation
- [ ] Email/SMS alert system

### Long-term (3-6 months)
- [ ] Multi-level baselines (state, district, pincode)
- [ ] Network analysis for coordinated fraud
- [ ] Real-time streaming pipeline (Kafka)
- [ ] Ensemble methods (LOF + One-Class SVM)
- [ ] Mobile app for field officers

---

## 👥 Team

**Team ID**: UIDAI_4571  
**Theme**: Data-Driven Innovation for Aadhaar  
**Competition**: UIDAI Hackathon 2026

---

## 📄 Documentation

Comprehensive documentation available in `/docs`:
- **Project_Sentinel_Analysis.docx**: Technical analysis & code review
- **Sentinel_Dashboard_Documentation.docx**: Dashboard user guide
- **Dashboard_Enhancements_Guide.docx**: Enhancement details

---

## 🤝 Contributing

We welcome contributions! Please follow these steps:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

---

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## 🙏 Acknowledgments

- **UIDAI** for the hackathon opportunity and dataset
- **Anthropic** for AI assistance in development
- **Streamlit** for the amazing web framework
- **Plotly** for interactive visualizations

---

## 📧 Contact

For questions or support, please contact:
- **Email**: sentinel-support@example.com
- **Issues**: [GitHub Issues](https://github.com/lovnnishverma/UIDAI/issues)
- **Discussions**: [GitHub Discussions](https://github.com/lovnishverma/UIDAI/discussions)

---

## 🌟 Star History

If you find this project useful, please consider giving it a ⭐!

---

<div align="center">
  <strong>Built with ❤️ for a safer Aadhaar ecosystem</strong>
  <br>
  <sub>© 2026 Project Sentinel. All rights reserved.</sub>
</div>