--- title: UIDAI Project Sentinel emoji: πŸš€ colorFrom: red colorTo: red sdk: docker app_port: 8501 tags: - streamlit pinned: false short_description: Data-Driven Innovation for Aadhaar --- # πŸ›‘οΈ Project Sentinel: AI-Powered Fraud Detection for UIDAI [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://huggingface.co/spaces/lovnishverma/UIDAI) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) > **Context-Aware Anomaly Detection System for Aadhaar Enrolment Centers** > Team ID: UIDAI_4571 | Theme: Data-Driven Innovation for Aadhaar --- ## 🎯 Quick Links - **πŸ“Š Live Notebook**: [Open in Google Colab](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing) - **πŸš€ Dashboard Demo**: [Hugging Face Spaces](https://huggingface.co/spaces/lovnishverma/UIDAI) - **πŸ“– Documentation**: See `/docs` folder - **πŸ’» Source Code**: Available in this repository --- ## 🎯 Overview Project Sentinel is an innovative fraud detection system designed specifically for UIDAI Aadhaar enrolment centers. Unlike traditional global threshold-based systems, Sentinel uses **context-aware machine learning** with district-level normalization to identify fraudulent patterns while accounting for India's demographic diversity. ### The Problem We Solve India's demographic diversity creates a unique challenge: - πŸ“Š Activities normal in Mumbai may be suspicious in tribal villages (and vice versa) - βš–οΈ Global thresholds either miss frauds or create false positives - 🎯 Need: Regional baselines that adapt to local patterns ### Our Innovation **District Normalization**: Each enrolment center is compared to its local district baseline, not a national average. **Example**: In a tribal district with 40% adult enrolment average, a center with 90% adult ratio gets flagged for deviationβ€”even if absolute numbers are lower than urban centers. --- ## ✨ Key Features ### πŸ€– Machine Learning Engine - **Algorithm**: Isolation Forest (Unsupervised Learning) - **Core Innovation**: Context-aware features with district baselines - **Detection**: Ghost IDs, weekend fraud, data manipulation, coordinated operations ### πŸ“Š Interactive Dashboard - **Real-time KPIs**: 6 comprehensive metrics with trend indicators - **Geographic Heatmap**: Risk visualization across India - **Pattern Analysis**: Scatter plots, histograms, time series - **Advanced Analytics**: Feature importance, correlation matrix, performance gauges ### πŸ” Smart Filtering - Date range selection for temporal analysis - Multi-select risk categories (Low/Medium/High/Critical) - Dynamic state β†’ district cascading - Weekend-only anomaly toggle ### πŸ“₯ Multiple Export Formats - **CSV**: Field team verification lists - **JSON**: API integration - **TXT**: Investigation reports for management --- ## πŸš€ Quick Start ### **Option 1: Google Colab (Fastest)** Run the complete analysis in your browser without any setup: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing) Click the badge above to open the notebook and run all cells to generate the analyzed data. ### **Option 2: Local Setup** ### Prerequisites ```bash Python 3.8+ pip (Python package manager) ``` ### Installation 1. **Clone the repository** ```bash git clone https://huggingface.co/spaces/lovnishverma/UIDAI cd UIDAI ``` 2. **Install dependencies** ```bash pip install -r requirements.txt ``` 3. **Run the Jupyter Notebook** (Data Processing) ```bash jupyter notebook project_sentinel_notebook.ipynb ``` This generates `analyzed_aadhaar_data.csv` 4. **Launch the Dashboard** ```bash streamlit run sentinel_dashboard_enhanced.py ``` 5. **Access the application** ``` http://localhost:8501 ``` --- ## πŸ“ Project Structure ``` UIDAI/ β”œβ”€β”€ README.md # This file β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ Dockerfile # Docker configuration β”œβ”€β”€ project_sentinel_notebook.ipynb # ML model & data processing β”œβ”€β”€ app.py # Streamlit dashboard β”œβ”€β”€ analyzed_aadhaar_data.csv # Processed data (generated from colab) β”œβ”€β”€ docs/ β”‚ β”œβ”€β”€ Project_Sentinel_Analysis.docx β”‚ β”œβ”€β”€ Sentinel_Dashboard_Documentation.docx β”‚ └── Dashboard_Enhancements_Guide.docx └── assets/ └── screenshots/ # Dashboard screenshots ``` --- ## 🧠 Technical Architecture ### Data Pipeline ``` Raw Data (Biometric + Demographic + Enrolment) ↓ SmartLoader (Chunked CSV ingestion) ↓ Master Merge (Outer joins on date/state/district/pincode) ↓ ContextEngine (District normalization) ↓ Feature Engineering (4 context-aware features) ↓ Isolation Forest (Anomaly detection) ↓ Risk Scoring (0-100 scale) ↓ Dashboard Visualization ``` ### Core Features (ML Model) | Feature | Description | Importance | |---------|-------------|------------| | **ratio_deviation** | Deviation from district avg adult ratio | 45% | | **weekend_spike_score** | Activity spike on weekends/holidays | 25% | | **mismatch_score** | Discrepancy between bio/demo updates | 20% | | **total_activity** | Overall transaction volume | 10% | ### Technology Stack - **Backend**: Python 3.8+, Pandas, NumPy, Scikit-learn - **ML**: Isolation Forest (Unsupervised Anomaly Detection) - **Frontend**: Streamlit (Web Framework) - **Visualization**: Plotly Express, Plotly Graph Objects - **Deployment**: Docker, Hugging Face Spaces --- ## πŸ“Š Dashboard Overview ### Tab 1: Geographic Analysis - **Interactive Map**: Risk heatmap with circle size = volume, color = risk - **Top 5 Hotspots**: Color-coded cards showing riskiest locations - **Risk Distribution**: Donut chart breakdown by category ### Tab 2: Pattern Analysis - **Ghost ID Indicator**: Scatter plot with deviation thresholds - **Risk Histogram**: Distribution concentration analysis - **Time Series**: Dual-axis chart showing trends over time - **Statistics**: Mean, median, std dev, 95th percentile ### Tab 3: Priority Cases - **Adjustable Threshold**: Slider to filter by minimum risk score - **Action Status**: Workflow tracking (Pending/Investigation/Resolved) - **Enhanced Table**: Progress bars, formatted columns - **Export Options**: CSV, JSON, TXT formats ### Tab 4: Advanced Analytics - **Feature Importance**: Bar chart showing ML contributions - **Performance Gauge**: Speedometer-style model accuracy - **Correlation Heatmap**: Feature relationship matrix - **Key Insights**: Contextual intelligence cards --- ## 🎨 Visual Design ### Professional Styling - **Gradients**: Purple/blue for government portal aesthetic - **Animations**: Pulsing alerts for critical cases - **Typography**: Google Fonts (Inter) for modern look - **Color Coding**: Risk levels with emoji indicators (πŸ”΄πŸŸ πŸŸ‘πŸŸ’) ### Responsive Layout - **Wide Mode**: Maximum data density - **Tabbed Interface**: Organized content reduces cognitive load - **Adaptive Visualizations**: Charts adjust to filter context --- ## πŸ”§ Configuration ### Model Parameters ```python Config.ML_FEATURES = [ 'ratio_deviation', # Primary fraud indicator 'weekend_spike_score', # Unauthorized operations 'mismatch_score', # Data manipulation 'total_activity' # Volume context ] Config.CONTAMINATION = 0.05 # 5% expected anomaly rate Config.RANDOM_STATE = 42 # Reproducibility ``` ### Risk Thresholds ```python RISK_CATEGORIES = { 'Low': [0, 50], 'Medium': [50, 70], 'High': [70, 85], 'Critical': [85, 100] } ``` --- ## πŸ“ˆ Use Cases ### 1. Ghost Identity Creation **Pattern**: Abnormally high adult enrolment ratio **Detection**: High positive ratio_deviation **Example**: District avg 40%, center reports 90% β†’ FLAGGED ### 2. Weekend/Holiday Fraud **Pattern**: Activity spikes when centers should be closed **Detection**: High weekend_spike_score **Example**: 5x normal activity on Sunday β†’ FLAGGED ### 3. Data Manipulation **Pattern**: Discrepancies between biometric and demographic updates **Detection**: High mismatch_score **Example**: 100 demo updates, 20 bio updates β†’ FLAGGED --- ## 🚒 Deployment ### Docker Deployment ```bash # Build image docker build -t sentinel-dashboard . # Run container docker run -p 8501:8501 sentinel-dashboard ``` ### Hugging Face Spaces The app is automatically deployed when you push to the main branch. ### Environment Variables ```bash STREAMLIT_SERVER_PORT=8501 STREAMLIT_SERVER_ADDRESS=0.0.0.0 STREAMLIT_SERVER_HEADLESS=true ``` --- ## πŸ“Š Performance Metrics ### Model Performance (Simulated) - **Precision**: 89% - **Recall**: 85% - **F1-Score**: 87% - **Accuracy**: 88% ### System Performance - **Data Points Processed**: 500K+ records - **Processing Time**: <1 second (cached) - **Dashboard Load Time**: ~2 seconds - **Visualization Rendering**: <500ms per chart --- ## πŸ”’ Security Considerations ### Current Implementation - βœ… Data caching for performance - βœ… Input validation on filters - βœ… Error handling for missing data - ⚠️ Simulated coordinates (demo only) ### Production Requirements - πŸ” SSO/OAuth authentication - πŸ” Role-based access control (RBAC) - πŸ” Audit logging for all actions - πŸ” Data encryption (at rest & in transit) - πŸ” Real geocoding with pincode master DB --- ## 🎯 Future Enhancements ### Short-term (1-3 months) - [ ] Real geocoding integration - [ ] SHAP values for explainability - [ ] Feedback loop for model refinement - [ ] PDF report generation - [ ] Email/SMS alert system ### Long-term (3-6 months) - [ ] Multi-level baselines (state, district, pincode) - [ ] Network analysis for coordinated fraud - [ ] Real-time streaming pipeline (Kafka) - [ ] Ensemble methods (LOF + One-Class SVM) - [ ] Mobile app for field officers --- ## πŸ‘₯ Team **Team ID**: UIDAI_4571 **Theme**: Data-Driven Innovation for Aadhaar **Competition**: UIDAI Hackathon 2026 --- ## πŸ“„ Documentation Comprehensive documentation available in `/docs`: - **Project_Sentinel_Analysis.docx**: Technical analysis & code review - **Sentinel_Dashboard_Documentation.docx**: Dashboard user guide - **Dashboard_Enhancements_Guide.docx**: Enhancement details --- ## 🀝 Contributing We welcome contributions! Please follow these steps: 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/AmazingFeature`) 3. Commit your changes (`git commit -m 'Add AmazingFeature'`) 4. Push to the branch (`git push origin feature/AmazingFeature`) 5. Open a Pull Request --- ## πŸ“ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- ## πŸ™ Acknowledgments - **UIDAI** for the hackathon opportunity and dataset - **Anthropic** for AI assistance in development - **Streamlit** for the amazing web framework - **Plotly** for interactive visualizations --- ## πŸ“§ Contact For questions or support, please contact: - **Email**: sentinel-support@example.com - **Issues**: [GitHub Issues](https://github.com/lovnnishverma/UIDAI/issues) - **Discussions**: [GitHub Discussions](https://github.com/lovnishverma/UIDAI/discussions) --- ## 🌟 Star History If you find this project useful, please consider giving it a ⭐! ---
Built with ❀️ for a safer Aadhaar ecosystem
Β© 2026 Project Sentinel. All rights reserved.