Spaces:

LovnishVerma
/

UIDAI

Sleeping

App Files Files Community

UIDAI / README.md

LovnishVerma

Update README.md

47e0648 verified about 1 month ago

preview code

raw

history blame

11.8 kB

	---
	title: UIDAI Project Sentinel
	emoji: 🚀
	colorFrom: red
	colorTo: red
	sdk: docker
	app_port: 8501
	tags:
	- streamlit
	pinned: false
	short_description: Data-Driven Innovation for Aadhaar
	---

	# 🛡️ Project Sentinel: AI-Powered Fraud Detection for UIDAI

	[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://huggingface.co/spaces/lovnishverma/UIDAI)
	[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

	> Context-Aware Anomaly Detection System for Aadhaar Enrolment Centers
	> Team ID: UIDAI_4571 \| Theme: Data-Driven Innovation for Aadhaar

	---

	## 🎯 Quick Links

	- 📊 Live Notebook: [Open in Google Colab](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
	- 🚀 Dashboard Demo: [Hugging Face Spaces](https://huggingface.co/spaces/lovnishverma/UIDAI)
	- 📖 Documentation: See `/docs` folder
	- 💻 Source Code: Available in this repository

	---

	## 🎯 Overview

	Project Sentinel is an innovative fraud detection system designed specifically for UIDAI Aadhaar enrolment centers. Unlike traditional global threshold-based systems, Sentinel uses context-aware machine learning with district-level normalization to identify fraudulent patterns while accounting for India's demographic diversity.

	### The Problem We Solve

	India's demographic diversity creates a unique challenge:
	- 📊 Activities normal in Mumbai may be suspicious in tribal villages (and vice versa)
	- ⚖️ Global thresholds either miss frauds or create false positives
	- 🎯 Need: Regional baselines that adapt to local patterns

	### Our Innovation

	District Normalization: Each enrolment center is compared to its local district baseline, not a national average.

	Example: In a tribal district with 40% adult enrolment average, a center with 90% adult ratio gets flagged for deviation—even if absolute numbers are lower than urban centers.

	---

	## ✨ Key Features

	### 🤖 Machine Learning Engine
	- Algorithm: Isolation Forest (Unsupervised Learning)
	- Core Innovation: Context-aware features with district baselines
	- Detection: Ghost IDs, weekend fraud, data manipulation, coordinated operations

	### 📊 Interactive Dashboard
	- Real-time KPIs: 6 comprehensive metrics with trend indicators
	- Geographic Heatmap: Risk visualization across India
	- Pattern Analysis: Scatter plots, histograms, time series
	- Advanced Analytics: Feature importance, correlation matrix, performance gauges

	### 🔍 Smart Filtering
	- Date range selection for temporal analysis
	- Multi-select risk categories (Low/Medium/High/Critical)
	- Dynamic state → district cascading
	- Weekend-only anomaly toggle

	### 📥 Multiple Export Formats
	- CSV: Field team verification lists
	- JSON: API integration
	- TXT: Investigation reports for management

	---

	## 🚀 Quick Start

	### Option 1: Google Colab (Fastest)
	Run the complete analysis in your browser without any setup:

	[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)

	Click the badge above to open the notebook and run all cells to generate the analyzed data.

	### Option 2: Local Setup

	### Prerequisites
	```bash
	Python 3.8+
	pip (Python package manager)
	```

	### Installation

	1. Clone the repository
	```bash
	git clone https://huggingface.co/spaces/lovnishverma/UIDAI
	cd UIDAI
	```

	2. Install dependencies
	```bash
	pip install -r requirements.txt
	```

	3. Run the Jupyter Notebook (Data Processing)
	```bash
	jupyter notebook project_sentinel_notebook.ipynb
	```
	This generates `analyzed_aadhaar_data.csv`

	4. Launch the Dashboard
	```bash
	streamlit run sentinel_dashboard_enhanced.py
	```

	5. Access the application
	```
	http://localhost:8501
	```

	---

	## 📁 Project Structure

	```
	UIDAI/
	├── README.md # This file
	├── requirements.txt # Python dependencies
	├── Dockerfile # Docker configuration
	├── project_sentinel_notebook.ipynb # ML model & data processing
	├── app.py # Streamlit dashboard
	├── analyzed_aadhaar_data.csv # Processed data (generated from colab)
	├── docs/
	│ ├── Project_Sentinel_Analysis.docx
	│ ├── Sentinel_Dashboard_Documentation.docx
	│ └── Dashboard_Enhancements_Guide.docx
	└── assets/
	└── screenshots/ # Dashboard screenshots
	```

	---

	## 🧠 Technical Architecture

	### Data Pipeline
	```
	Raw Data (Biometric + Demographic + Enrolment)
	↓
	SmartLoader (Chunked CSV ingestion)
	↓
	Master Merge (Outer joins on date/state/district/pincode)
	↓
	ContextEngine (District normalization)
	↓
	Feature Engineering (4 context-aware features)
	↓
	Isolation Forest (Anomaly detection)
	↓
	Risk Scoring (0-100 scale)
	↓
	Dashboard Visualization
	```

	### Core Features (ML Model)

	\| Feature \| Description \| Importance \|
	\|---------\|-------------\|------------\|
	\| ratio_deviation \| Deviation from district avg adult ratio \| 45% \|
	\| weekend_spike_score \| Activity spike on weekends/holidays \| 25% \|
	\| mismatch_score \| Discrepancy between bio/demo updates \| 20% \|
	\| total_activity \| Overall transaction volume \| 10% \|

	### Technology Stack

	- Backend: Python 3.8+, Pandas, NumPy, Scikit-learn
	- ML: Isolation Forest (Unsupervised Anomaly Detection)
	- Frontend: Streamlit (Web Framework)
	- Visualization: Plotly Express, Plotly Graph Objects
	- Deployment: Docker, Hugging Face Spaces

	---

	## 📊 Dashboard Overview

	### Tab 1: Geographic Analysis
	- Interactive Map: Risk heatmap with circle size = volume, color = risk
	- Top 5 Hotspots: Color-coded cards showing riskiest locations
	- Risk Distribution: Donut chart breakdown by category

	### Tab 2: Pattern Analysis
	- Ghost ID Indicator: Scatter plot with deviation thresholds
	- Risk Histogram: Distribution concentration analysis
	- Time Series: Dual-axis chart showing trends over time
	- Statistics: Mean, median, std dev, 95th percentile

	### Tab 3: Priority Cases
	- Adjustable Threshold: Slider to filter by minimum risk score
	- Action Status: Workflow tracking (Pending/Investigation/Resolved)
	- Enhanced Table: Progress bars, formatted columns
	- Export Options: CSV, JSON, TXT formats

	### Tab 4: Advanced Analytics
	- Feature Importance: Bar chart showing ML contributions
	- Performance Gauge: Speedometer-style model accuracy
	- Correlation Heatmap: Feature relationship matrix
	- Key Insights: Contextual intelligence cards

	---

	## 🎨 Visual Design

	### Professional Styling
	- Gradients: Purple/blue for government portal aesthetic
	- Animations: Pulsing alerts for critical cases
	- Typography: Google Fonts (Inter) for modern look
	- Color Coding: Risk levels with emoji indicators (🔴🟠🟡🟢)

	### Responsive Layout
	- Wide Mode: Maximum data density
	- Tabbed Interface: Organized content reduces cognitive load
	- Adaptive Visualizations: Charts adjust to filter context

	---

	## 🔧 Configuration

	### Model Parameters
	```python
	Config.ML_FEATURES = [
	'ratio_deviation', # Primary fraud indicator
	'weekend_spike_score', # Unauthorized operations
	'mismatch_score', # Data manipulation
	'total_activity' # Volume context
	]
	Config.CONTAMINATION = 0.05 # 5% expected anomaly rate
	Config.RANDOM_STATE = 42 # Reproducibility
	```

	### Risk Thresholds
	```python
	RISK_CATEGORIES = {
	'Low': [0, 50],
	'Medium': [50, 70],
	'High': [70, 85],
	'Critical': [85, 100]
	}
	```

	---

	## 📈 Use Cases

	### 1. Ghost Identity Creation
	Pattern: Abnormally high adult enrolment ratio
	Detection: High positive ratio_deviation
	Example: District avg 40%, center reports 90% → FLAGGED

	### 2. Weekend/Holiday Fraud
	Pattern: Activity spikes when centers should be closed
	Detection: High weekend_spike_score
	Example: 5x normal activity on Sunday → FLAGGED

	### 3. Data Manipulation
	Pattern: Discrepancies between biometric and demographic updates
	Detection: High mismatch_score
	Example: 100 demo updates, 20 bio updates → FLAGGED

	---

	## 🚢 Deployment

	### Docker Deployment
	```bash
	# Build image
	docker build -t sentinel-dashboard .

	# Run container
	docker run -p 8501:8501 sentinel-dashboard
	```

	### Hugging Face Spaces
	The app is automatically deployed when you push to the main branch.

	### Environment Variables
	```bash
	STREAMLIT_SERVER_PORT=8501
	STREAMLIT_SERVER_ADDRESS=0.0.0.0
	STREAMLIT_SERVER_HEADLESS=true
	```

	---

	## 📊 Performance Metrics

	### Model Performance (Simulated)
	- Precision: 89%
	- Recall: 85%
	- F1-Score: 87%
	- Accuracy: 88%

	### System Performance
	- Data Points Processed: 500K+ records
	- Processing Time: <1 second (cached)
	- Dashboard Load Time: ~2 seconds
	- Visualization Rendering: <500ms per chart

	---

	## 🔒 Security Considerations

	### Current Implementation
	- ✅ Data caching for performance
	- ✅ Input validation on filters
	- ✅ Error handling for missing data
	- ⚠️ Simulated coordinates (demo only)

	### Production Requirements
	- 🔐 SSO/OAuth authentication
	- 🔐 Role-based access control (RBAC)
	- 🔐 Audit logging for all actions
	- 🔐 Data encryption (at rest & in transit)
	- 🔐 Real geocoding with pincode master DB

	---

	## 🎯 Future Enhancements

	### Short-term (1-3 months)
	- [ ] Real geocoding integration
	- [ ] SHAP values for explainability
	- [ ] Feedback loop for model refinement
	- [ ] PDF report generation
	- [ ] Email/SMS alert system

	### Long-term (3-6 months)
	- [ ] Multi-level baselines (state, district, pincode)
	- [ ] Network analysis for coordinated fraud
	- [ ] Real-time streaming pipeline (Kafka)
	- [ ] Ensemble methods (LOF + One-Class SVM)
	- [ ] Mobile app for field officers

	---

	## 👥 Team

	Team ID: UIDAI_4571
	Theme: Data-Driven Innovation for Aadhaar
	Competition: UIDAI Hackathon 2026

	---

	## 📄 Documentation

	Comprehensive documentation available in `/docs`:
	- Project_Sentinel_Analysis.docx: Technical analysis & code review
	- Sentinel_Dashboard_Documentation.docx: Dashboard user guide
	- Dashboard_Enhancements_Guide.docx: Enhancement details

	---

	## 🤝 Contributing

	We welcome contributions! Please follow these steps:

	1. Fork the repository
	2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
	3. Commit your changes (`git commit -m 'Add AmazingFeature'`)
	4. Push to the branch (`git push origin feature/AmazingFeature`)
	5. Open a Pull Request

	---

	## 📝 License

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	---

	## 🙏 Acknowledgments

	- UIDAI for the hackathon opportunity and dataset
	- Anthropic for AI assistance in development
	- Streamlit for the amazing web framework
	- Plotly for interactive visualizations

	---

	## 📧 Contact

	For questions or support, please contact:
	- Email: sentinel-support@example.com
	- Issues: [GitHub Issues](https://github.com/lovnnishverma/UIDAI/issues)
	- Discussions: [GitHub Discussions](https://github.com/lovnishverma/UIDAI/discussions)

	---

	## 🌟 Star History

	If you find this project useful, please consider giving it a ⭐!

	---

	<div align="center">
	<strong>Built with ❤️ for a safer Aadhaar ecosystem</strong>
	<br>
	<sub>© 2026 Project Sentinel. All rights reserved.</sub>
	</div>