Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -17,395 +17,177 @@ short_description: Data-Driven Innovation for Aadhaar
|
|
| 17 |
[](https://www.python.org/downloads/)
|
| 18 |
[](https://opensource.org/licenses/MIT)
|
| 19 |
|
| 20 |
-
> **Context-Aware Anomaly Detection System for Aadhaar Enrolment Centers**
|
| 21 |
-
> Team ID: UIDAI_4571 | Theme: Data-Driven Innovation for Aadhaar
|
| 22 |
|
| 23 |
---
|
| 24 |
|
| 25 |
## π― Quick Links
|
| 26 |
|
| 27 |
-
- **π Live Notebook**: [Open in Google Colab](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
|
| 28 |
-
- **π Dashboard
|
| 29 |
-
- **π
|
| 30 |
- **π» Source Code**: Available in this repository
|
| 31 |
|
| 32 |
---
|
| 33 |
|
| 34 |
## π― Overview
|
| 35 |
|
| 36 |
-
Project S.A.T.A.R.K
|
| 37 |
|
| 38 |
-
### The Problem
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
-
|
| 43 |
-
- π― Need: Regional baselines that adapt to local patterns
|
| 44 |
-
|
| 45 |
-
### Our Innovation
|
| 46 |
-
|
| 47 |
-
**District Normalization**: Each enrolment center is compared to its local district baseline, not a national average.
|
| 48 |
-
|
| 49 |
-
**Example**: In a tribal district with 40% adult enrolment average, a center with 90% adult ratio gets flagged for deviationβeven if absolute numbers are lower than urban centers.
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
## β¨ Key Features
|
| 54 |
|
| 55 |
-
###
|
| 56 |
- **Algorithm**: Isolation Forest (Unsupervised Learning)
|
| 57 |
-
- **
|
| 58 |
-
- **
|
| 59 |
-
|
| 60 |
-
### π
|
| 61 |
-
- **
|
| 62 |
-
- **
|
| 63 |
-
- **
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
-
|
| 68 |
-
- Multi-select risk categories (Low/Medium/High/Critical)
|
| 69 |
-
- Dynamic state β district cascading
|
| 70 |
-
- Weekend-only anomaly toggle
|
| 71 |
-
|
| 72 |
-
### π₯ Multiple Export Formats
|
| 73 |
-
- **CSV**: Field team verification lists
|
| 74 |
-
- **JSON**: API integration
|
| 75 |
-
- **TXT**: Investigation reports for management
|
| 76 |
|
| 77 |
---
|
| 78 |
|
| 79 |
## π Quick Start
|
| 80 |
|
| 81 |
-
### **Option 1: Google Colab
|
| 82 |
-
|
| 83 |
|
| 84 |
[](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
### **Option 2: Local
|
| 89 |
|
| 90 |
-
|
| 91 |
-
```bash
|
| 92 |
-
Python 3.8+
|
| 93 |
-
pip (Python package manager)
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
### Installation
|
| 97 |
|
| 98 |
1. **Clone the repository**
|
| 99 |
-
```bash
|
| 100 |
-
git clone https://huggingface.co/spaces/lovnishverma/UIDAI
|
| 101 |
-
cd UIDAI
|
|
|
|
| 102 |
```
|
| 103 |
|
| 104 |
2. **Install dependencies**
|
| 105 |
```bash
|
| 106 |
pip install -r requirements.txt
|
| 107 |
-
```
|
| 108 |
|
| 109 |
-
3. **Run the Jupyter Notebook** (Data Processing)
|
| 110 |
-
```bash
|
| 111 |
-
jupyter notebook project_notebook.ipynb
|
| 112 |
```
|
| 113 |
-
This generates `analyzed_aadhaar_data.csv`
|
| 114 |
|
| 115 |
-
|
|
|
|
| 116 |
```bash
|
| 117 |
streamlit run app.py
|
| 118 |
-
```
|
| 119 |
|
| 120 |
-
5. **Access the application**
|
| 121 |
-
```
|
| 122 |
-
http://localhost:8501
|
| 123 |
```
|
| 124 |
|
| 125 |
-
---
|
| 126 |
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
```
|
| 130 |
-
UIDAI/
|
| 131 |
-
βββ README.md # This file
|
| 132 |
-
βββ requirements.txt # Python dependencies
|
| 133 |
-
βββ Dockerfile # Docker configuration
|
| 134 |
-
βββ project_notebook.ipynb # ML model & data processing
|
| 135 |
-
βββ app.py # Streamlit dashboard
|
| 136 |
-
βββ analyzed_aadhaar_data.csv # Processed data (generated from colab)
|
| 137 |
-
βββ docs/
|
| 138 |
-
β βββ Project_S.A.T.A.R.K_Analysis.docx
|
| 139 |
-
β βββ S.A.T.A.R.K_Dashboard_Documentation.docx
|
| 140 |
-
β βββ Dashboard_Enhancements_Guide.docx
|
| 141 |
-
βββ assets/
|
| 142 |
-
βββ screenshots/ # Dashboard screenshots
|
| 143 |
-
```
|
| 144 |
-
|
| 145 |
-
---
|
| 146 |
-
|
| 147 |
-
## π§ Technical Architecture
|
| 148 |
-
|
| 149 |
-
### Data Pipeline
|
| 150 |
-
```
|
| 151 |
-
Raw Data (Biometric + Demographic + Enrolment)
|
| 152 |
-
β
|
| 153 |
-
SmartLoader (Chunked CSV ingestion)
|
| 154 |
-
β
|
| 155 |
-
Master Merge (Outer joins on date/state/district/pincode)
|
| 156 |
-
β
|
| 157 |
-
ContextEngine (District normalization)
|
| 158 |
-
β
|
| 159 |
-
Feature Engineering (4 context-aware features)
|
| 160 |
-
β
|
| 161 |
-
Isolation Forest (Anomaly detection)
|
| 162 |
-
β
|
| 163 |
-
Risk Scoring (0-100 scale)
|
| 164 |
-
β
|
| 165 |
-
Dashboard Visualization
|
| 166 |
-
```
|
| 167 |
-
|
| 168 |
-
### Core Features (ML Model)
|
| 169 |
-
|
| 170 |
-
| Feature | Description | Importance |
|
| 171 |
-
|---------|-------------|------------|
|
| 172 |
-
| **ratio_deviation** | Deviation from district avg adult ratio | 45% |
|
| 173 |
-
| **weekend_spike_score** | Activity spike on weekends/holidays | 25% |
|
| 174 |
-
| **mismatch_score** | Discrepancy between bio/demo updates | 20% |
|
| 175 |
-
| **total_activity** | Overall transaction volume | 10% |
|
| 176 |
-
|
| 177 |
-
### Technology Stack
|
| 178 |
-
|
| 179 |
-
- **Backend**: Python 3.8+, Pandas, NumPy, Scikit-learn
|
| 180 |
-
- **ML**: Isolation Forest (Unsupervised Anomaly Detection)
|
| 181 |
-
- **Frontend**: Streamlit (Web Framework)
|
| 182 |
-
- **Visualization**: Plotly Express, Plotly Graph Objects
|
| 183 |
-
- **Deployment**: Docker, Hugging Face Spaces
|
| 184 |
|
| 185 |
---
|
| 186 |
|
| 187 |
-
##
|
| 188 |
-
|
| 189 |
-
### Tab 1: Geographic Analysis
|
| 190 |
-
- **Interactive Map**: Risk heatmap with circle size = volume, color = risk
|
| 191 |
-
- **Top 5 Hotspots**: Color-coded cards showing riskiest locations
|
| 192 |
-
- **Risk Distribution**: Donut chart breakdown by category
|
| 193 |
-
|
| 194 |
-
### Tab 2: Pattern Analysis
|
| 195 |
-
- **Ghost ID Indicator**: Scatter plot with deviation thresholds
|
| 196 |
-
- **Risk Histogram**: Distribution concentration analysis
|
| 197 |
-
- **Time Series**: Dual-axis chart showing trends over time
|
| 198 |
-
- **Statistics**: Mean, median, std dev, 95th percentile
|
| 199 |
-
|
| 200 |
-
### Tab 3: Priority Cases
|
| 201 |
-
- **Adjustable Threshold**: Slider to filter by minimum risk score
|
| 202 |
-
- **Action Status**: Workflow tracking (Pending/Investigation/Resolved)
|
| 203 |
-
- **Enhanced Table**: Progress bars, formatted columns
|
| 204 |
-
- **Export Options**: CSV, JSON, TXT formats
|
| 205 |
-
|
| 206 |
-
### Tab 4: Advanced Analytics
|
| 207 |
-
- **Feature Importance**: Bar chart showing ML contributions
|
| 208 |
-
- **Performance Gauge**: Speedometer-style model accuracy
|
| 209 |
-
- **Correlation Heatmap**: Feature relationship matrix
|
| 210 |
-
- **Key Insights**: Contextual intelligence cards
|
| 211 |
-
|
| 212 |
-
---
|
| 213 |
-
|
| 214 |
-
## π¨ Visual Design
|
| 215 |
-
|
| 216 |
-
### Professional Styling
|
| 217 |
-
- **Gradients**: Purple/blue for government portal aesthetic
|
| 218 |
-
- **Animations**: Pulsing alerts for critical cases
|
| 219 |
-
- **Typography**: Google Fonts (Inter) for modern look
|
| 220 |
-
- **Color Coding**: Risk levels with emoji indicators (π΄π π‘π’)
|
| 221 |
-
|
| 222 |
-
### Responsive Layout
|
| 223 |
-
- **Wide Mode**: Maximum data density
|
| 224 |
-
- **Tabbed Interface**: Organized content reduces cognitive load
|
| 225 |
-
- **Adaptive Visualizations**: Charts adjust to filter context
|
| 226 |
-
|
| 227 |
-
---
|
| 228 |
|
| 229 |
-
## π§ Configuration
|
| 230 |
-
|
| 231 |
-
### Model Parameters
|
| 232 |
-
```python
|
| 233 |
-
Config.ML_FEATURES = [
|
| 234 |
-
'ratio_deviation', # Primary fraud indicator
|
| 235 |
-
'weekend_spike_score', # Unauthorized operations
|
| 236 |
-
'mismatch_score', # Data manipulation
|
| 237 |
-
'total_activity' # Volume context
|
| 238 |
-
]
|
| 239 |
-
Config.CONTAMINATION = 0.05 # 5% expected anomaly rate
|
| 240 |
-
Config.RANDOM_STATE = 42 # Reproducibility
|
| 241 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 242 |
|
| 243 |
-
### Risk Thresholds
|
| 244 |
-
```python
|
| 245 |
-
RISK_CATEGORIES = {
|
| 246 |
-
'Low': [0, 50],
|
| 247 |
-
'Medium': [50, 70],
|
| 248 |
-
'High': [70, 85],
|
| 249 |
-
'Critical': [85, 100]
|
| 250 |
-
}
|
| 251 |
```
|
| 252 |
|
| 253 |
---
|
| 254 |
|
| 255 |
-
##
|
| 256 |
-
|
| 257 |
-
### 1. Ghost Identity Creation
|
| 258 |
-
**Pattern**: Abnormally high adult enrolment ratio
|
| 259 |
-
**Detection**: High positive ratio_deviation
|
| 260 |
-
**Example**: District avg 40%, center reports 90% β FLAGGED
|
| 261 |
-
|
| 262 |
-
### 2. Weekend/Holiday Fraud
|
| 263 |
-
**Pattern**: Activity spikes when centers should be closed
|
| 264 |
-
**Detection**: High weekend_spike_score
|
| 265 |
-
**Example**: 5x normal activity on Sunday β FLAGGED
|
| 266 |
-
|
| 267 |
-
### 3. Data Manipulation
|
| 268 |
-
**Pattern**: Discrepancies between biometric and demographic updates
|
| 269 |
-
**Detection**: High mismatch_score
|
| 270 |
-
**Example**: 100 demo updates, 20 bio updates β FLAGGED
|
| 271 |
-
|
| 272 |
-
---
|
| 273 |
-
|
| 274 |
-
## π’ Deployment
|
| 275 |
-
|
| 276 |
-
### Docker Deployment
|
| 277 |
-
```bash
|
| 278 |
-
# Build image
|
| 279 |
-
docker build -t app .
|
| 280 |
-
|
| 281 |
-
# Run container
|
| 282 |
-
docker run -p 8501:8501 app
|
| 283 |
-
```
|
| 284 |
-
|
| 285 |
-
### Hugging Face Spaces
|
| 286 |
-
The app is automatically deployed when you push to the main branch.
|
| 287 |
|
| 288 |
-
###
|
| 289 |
-
```bash
|
| 290 |
-
STREAMLIT_SERVER_PORT=8501
|
| 291 |
-
STREAMLIT_SERVER_ADDRESS=0.0.0.0
|
| 292 |
-
STREAMLIT_SERVER_HEADLESS=true
|
| 293 |
-
```
|
| 294 |
|
| 295 |
-
|
|
|
|
|
|
|
|
|
|
| 296 |
|
| 297 |
-
|
| 298 |
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
### System Performance
|
| 306 |
-
- **Data Points Processed**: 500K+ records
|
| 307 |
-
- **Processing Time**: <1 second (cached)
|
| 308 |
-
- **Dashboard Load Time**: ~2 seconds
|
| 309 |
-
- **Visualization Rendering**: <500ms per chart
|
| 310 |
|
| 311 |
---
|
| 312 |
|
| 313 |
-
##
|
| 314 |
-
|
| 315 |
-
### Current Implementation
|
| 316 |
-
- β
Data caching for performance
|
| 317 |
-
- β
Input validation on filters
|
| 318 |
-
- β
Error handling for missing data
|
| 319 |
-
- β οΈ Simulated coordinates (demo only)
|
| 320 |
-
|
| 321 |
-
### Production Requirements
|
| 322 |
-
- π SSO/OAuth authentication
|
| 323 |
-
- π Role-based access control (RBAC)
|
| 324 |
-
- π Audit logging for all actions
|
| 325 |
-
- π Data encryption (at rest & in transit)
|
| 326 |
-
- π Real geocoding with pincode master DB
|
| 327 |
-
|
| 328 |
-
---
|
| 329 |
|
| 330 |
-
|
| 331 |
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
- [ ] SHAP values for explainability
|
| 335 |
-
- [ ] Feedback loop for model refinement
|
| 336 |
-
- [ ] PDF report generation
|
| 337 |
-
- [ ] Email/SMS alert system
|
| 338 |
|
| 339 |
-
###
|
| 340 |
-
- [ ] Multi-level baselines (state, district, pincode)
|
| 341 |
-
- [ ] Network analysis for coordinated fraud
|
| 342 |
-
- [ ] Real-time streaming pipeline (Kafka)
|
| 343 |
-
- [ ] Ensemble methods (LOF + One-Class SVM)
|
| 344 |
-
- [ ] Mobile app for field officers
|
| 345 |
|
| 346 |
-
|
| 347 |
|
| 348 |
-
|
| 349 |
|
| 350 |
-
|
| 351 |
-
**Theme**: Data-Driven Innovation for Aadhaar
|
| 352 |
-
**Competition**: UIDAI Hackathon 2026
|
| 353 |
|
| 354 |
---
|
| 355 |
|
| 356 |
-
##
|
| 357 |
|
| 358 |
-
|
| 359 |
-
- **Project_S.A.T.A.R.K_Analysis.docx**: Technical analysis & code review
|
| 360 |
-
- **Sentinel_Dashboard_Documentation.docx**: Dashboard user guide
|
| 361 |
-
- **Dashboard_Enhancements_Guide.docx**: Enhancement details
|
| 362 |
|
| 363 |
-
|
| 364 |
|
| 365 |
-
|
| 366 |
|
| 367 |
-
|
| 368 |
|
| 369 |
-
|
| 370 |
-
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
|
| 371 |
-
3. Commit your changes (`git commit -m 'Add AmazingFeature'`)
|
| 372 |
-
4. Push to the branch (`git push origin feature/AmazingFeature`)
|
| 373 |
-
5. Open a Pull Request
|
| 374 |
|
| 375 |
---
|
| 376 |
|
| 377 |
## π License
|
| 378 |
|
| 379 |
-
This project is
|
| 380 |
|
| 381 |
---
|
| 382 |
|
| 383 |
-
|
|
|
|
| 384 |
|
| 385 |
-
- **UIDAI** for the hackathon opportunity and dataset
|
| 386 |
-
- **Anthropic** for AI assistance in development
|
| 387 |
-
- **Streamlit** for the amazing web framework
|
| 388 |
-
- **Plotly** for interactive visualizations
|
| 389 |
|
| 390 |
-
---
|
| 391 |
|
| 392 |
-
## π§ Contact
|
| 393 |
|
| 394 |
-
|
| 395 |
-
- **Email**: princelv84@gmail.com
|
| 396 |
-
- **Issues**: [GitHub Issues](https://github.com/lovnnishverma/UIDAI/issues)
|
| 397 |
-
- **Discussions**: [GitHub Discussions](https://github.com/lovnishverma/UIDAI/discussions)
|
| 398 |
|
| 399 |
-
---
|
| 400 |
-
|
| 401 |
-
## π Star History
|
| 402 |
|
| 403 |
-
If you find this project useful, please consider giving it a β!
|
| 404 |
|
| 405 |
-
---
|
| 406 |
|
| 407 |
-
|
| 408 |
-
|
| 409 |
-
<br>
|
| 410 |
-
<sub>Β© 2026 Project S.A.T.A.R.K. All rights reserved.</sub>
|
| 411 |
-
</div>
|
|
|
|
| 17 |
[](https://www.python.org/downloads/)
|
| 18 |
[](https://opensource.org/licenses/MIT)
|
| 19 |
|
| 20 |
+
> **Context-Aware Anomaly Detection System for Aadhaar Enrolment Centers** > **Team ID:** UIDAI_4571 | **Theme:** Data-Driven Innovation for Aadhaar
|
|
|
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
## π― Quick Links
|
| 25 |
|
| 26 |
+
- **π Live Analysis Notebook**: [Open in Google Colab](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
|
| 27 |
+
- **π Live Dashboard**: [Hugging Face Spaces](https://huggingface.co/spaces/lovnishverma/UIDAI)
|
| 28 |
+
- **π Project Report**: [View PDF](Final-Project-Report.pdf)
|
| 29 |
- **π» Source Code**: Available in this repository
|
| 30 |
|
| 31 |
---
|
| 32 |
|
| 33 |
## π― Overview
|
| 34 |
|
| 35 |
+
**Project S.A.T.A.R.K** (Statistical Anomaly Tracking & Aadhaar Risk Kit) is a revolutionary fraud detection system designed to solve the critical "Accuracy vs. Fairness" trade-off in Aadhaar vigilance.
|
| 36 |
|
| 37 |
+
### The Problem
|
| 38 |
+
India's demographic diversity makes global rules ineffective:
|
| 39 |
+
- β **Strict Rules:** Flag legitimate activities in tribal belts (False Positives).
|
| 40 |
+
- β **Lenient Rules:** Miss sophisticated fraud in metropolitan areas (False Negatives).
|
| 41 |
|
| 42 |
+
### Our Innovation: District Normalization
|
| 43 |
+
Instead of using a national average, S.A.T.A.R.K compares each enrolment center against its **local district baseline**.
|
| 44 |
+
- **Example:** In a tribal district where late enrolment is common (Avg: 40%), a center doing 90% is flagged. But in a city where 90% is normal, it is marked safe.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
---
|
| 47 |
|
| 48 |
## β¨ Key Features
|
| 49 |
|
| 50 |
+
### π§ The "Context-Aware" AI Engine
|
| 51 |
- **Algorithm**: Isolation Forest (Unsupervised Learning)
|
| 52 |
+
- **Smart Logic**: Detects anomalies relative to local geography.
|
| 53 |
+
- **Capabilities**: Identifies "Ghost IDs", "Sunday Surges" (Illegal Camps), and "Mass Update Operations".
|
| 54 |
+
|
| 55 |
+
### π The Vigilance Dashboard
|
| 56 |
+
- **Geospatial Intelligence**: Interactive Heatmap of High-Risk Centers.
|
| 57 |
+
- **Actionable Insights**: "Priority Action List" exportable for field agents.
|
| 58 |
+
- **Evidence-Based**: Charts proving *why* a center was flagged (e.g., Weekend Activity vs. Weekday).
|
| 59 |
+
|
| 60 |
+
### π₯ Smart Data Ingestion
|
| 61 |
+
- **Automated**: Recursively fetches and merges fragmented CSV chunks.
|
| 62 |
+
- **Robust**: Handles massive datasets without data loss using Outer Joins.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
---
|
| 65 |
|
| 66 |
## π Quick Start
|
| 67 |
|
| 68 |
+
### **Option 1: Run Analysis (Google Colab)**
|
| 69 |
+
To see the Feature Engineering and Model Training in action:
|
| 70 |
|
| 71 |
[](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
|
| 72 |
|
| 73 |
+
1. Open the Notebook.
|
| 74 |
+
2. Run all cells to process the raw data.
|
| 75 |
+
3. Download the generated `analyzed_aadhaar_data.csv`.
|
| 76 |
|
| 77 |
+
### **Option 2: Run Dashboard (Local)**
|
| 78 |
|
| 79 |
+
**Prerequisites:** Python 3.8+, pip
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
1. **Clone the repository**
|
| 82 |
+
```bash
|
| 83 |
+
git clone [https://huggingface.co/spaces/lovnishverma/UIDAI](https://huggingface.co/spaces/lovnishverma/UIDAI)
|
| 84 |
+
cd UIDAI
|
| 85 |
+
|
| 86 |
```
|
| 87 |
|
| 88 |
2. **Install dependencies**
|
| 89 |
```bash
|
| 90 |
pip install -r requirements.txt
|
|
|
|
| 91 |
|
|
|
|
|
|
|
|
|
|
| 92 |
```
|
|
|
|
| 93 |
|
| 94 |
+
|
| 95 |
+
3. **Launch the App**
|
| 96 |
```bash
|
| 97 |
streamlit run app.py
|
|
|
|
| 98 |
|
|
|
|
|
|
|
|
|
|
| 99 |
```
|
| 100 |
|
|
|
|
| 101 |
|
| 102 |
+
4. **Access the Dashboard**
|
| 103 |
+
Open `http://localhost:8501` in your browser.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
---
|
| 106 |
|
| 107 |
+
## π Project Structure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
```
|
| 110 |
+
UIDAI/
|
| 111 |
+
βββ README.md # This documentation
|
| 112 |
+
βββ requirements.txt # Python dependencies
|
| 113 |
+
βββ Dockerfile # Container configuration
|
| 114 |
+
βββ app.py # Streamlit Dashboard Code
|
| 115 |
+
βββ UIDAI_4571_(PROJECT_S_A_T_A_R_K_AI).ipynb # Main Analysis Notebook
|
| 116 |
+
βββ analyzed_aadhaar_data.csv # Processed Data for Dashboard
|
| 117 |
+
βββ Final-Project-Report.pdf # Complete Project Documentation
|
| 118 |
+
βββ assets/ # Images and logos
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
```
|
| 121 |
|
| 122 |
---
|
| 123 |
|
| 124 |
+
## π§ Technical Architecture
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
+
### The Pipeline
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
+
1. **Ingestion**: `SmartLoader` class merges fragmented CSVs.
|
| 129 |
+
2. **Context Engine**: Calculates `ratio_deviation` (Center vs. District).
|
| 130 |
+
3. **AI Model**: `IsolationForest` detects statistical outliers.
|
| 131 |
+
4. **Visualization**: Streamlit app renders the `RISK_SCORE` on maps.
|
| 132 |
|
| 133 |
+
### Core Risk Signals
|
| 134 |
|
| 135 |
+
| Feature | Logic | Detects |
|
| 136 |
+
| --- | --- | --- |
|
| 137 |
+
| **Ratio Deviation** | `(Center_Ratio - District_Avg)` | Ghost IDs |
|
| 138 |
+
| **Weekend Spike** | `Activity on Sunday / Normal Day` | Illegal Camps |
|
| 139 |
+
| **Mismatch Score** | ` | Bio - Demo |
|
| 140 |
+
| **Volume Anomaly** | `Total_Activity > 99th Percentile` | Mass Operations |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
---
|
| 143 |
|
| 144 |
+
## π Dashboard Preview
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 145 |
|
| 146 |
+
### 1. Geographic Heatmap
|
| 147 |
|
| 148 |
+
Instantly spot high-risk clusters across India.
|
| 149 |
+
*(See `assets/` for screenshots)*
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
|
| 151 |
+
### 2. Priority Action List
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
|
| 153 |
+
Downloadable CSV for vigilance officers containing only the top 1% critical cases.
|
| 154 |
|
| 155 |
+
### 3. AI Insights Panel
|
| 156 |
|
| 157 |
+
"Why is this flagged?" - The AI explains its decision (e.g., *"Flagged due to 500% spike in weekend activity"*).
|
|
|
|
|
|
|
| 158 |
|
| 159 |
---
|
| 160 |
|
| 161 |
+
## π₯ Team UIDAI_4571
|
| 162 |
|
| 163 |
+
**Team Leader:** Aman Choudhary (NIELIT Ropar)
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
+
**Team Member:** Prateek Dhar Dwivedi (NIELIT Ropar)
|
| 166 |
|
| 167 |
+
**Mentor:** Lovnish Verma (Project Engineer, NIELIT Ropar)
|
| 168 |
|
| 169 |
+
**Competition:** UIDAI Hackathon 2026
|
| 170 |
|
| 171 |
+
**Submission Date:** January 2026
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
|
| 173 |
---
|
| 174 |
|
| 175 |
## π License
|
| 176 |
|
| 177 |
+
This project is open-source under the [MIT License](https://www.google.com/search?q=LICENSE).
|
| 178 |
|
| 179 |
---
|
| 180 |
|
| 181 |
+
<div align="center">
|
| 182 |
+
<strong>Project S.A.T.A.R.K.</strong>
|
| 183 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
|
|
|
|
| 185 |
|
|
|
|
| 186 |
|
| 187 |
+
<em>Statistical Anomaly Tracking & Aadhaar Risk Kit</em>
|
|
|
|
|
|
|
|
|
|
| 188 |
|
|
|
|
|
|
|
|
|
|
| 189 |
|
|
|
|
| 190 |
|
|
|
|
| 191 |
|
| 192 |
+
Built with β€οΈ for a safer, inclusive Digital India.
|
| 193 |
+
</div>
|
|
|
|
|
|
|
|
|