File size: 5,905 Bytes
26fc2f2
cf9d688
26fc2f2
 
 
 
 
 
1814306
26fc2f2
3ba3633
26fc2f2
 
cf9d688
26fc2f2
5f909d5
3ba3633
 
26fc2f2
2c93382
3ba3633
 
 
47e0648
 
2c93382
 
 
47e0648
 
 
 
3ba3633
 
2c93382
3ba3633
2c93382
 
 
 
3ba3633
2c93382
 
 
3ba3633
 
 
 
 
2c93382
3ba3633
2c93382
 
 
 
 
 
 
 
 
 
 
3ba3633
 
 
 
 
2c93382
 
47e0648
 
 
2c93382
 
 
47e0648
2c93382
47e0648
2c93382
3ba3633
 
2c93382
 
 
 
3ba3633
 
 
 
 
 
 
 
2c93382
 
3ba3633
6bd67c2
3ba3633
 
 
 
2c93382
 
3ba3633
 
 
2c93382
3ba3633
 
2c93382
 
 
 
 
 
 
 
 
3ba3633
 
 
 
 
2c93382
3ba3633
2c93382
3ba3633
2c93382
 
 
 
3ba3633
2c93382
3ba3633
2c93382
 
 
 
 
 
3ba3633
 
 
2c93382
3ba3633
2c93382
3ba3633
2c93382
 
3ba3633
2c93382
3ba3633
2c93382
3ba3633
2c93382
3ba3633
2c93382
3ba3633
 
 
2c93382
3ba3633
2c93382
3ba3633
2c93382
3ba3633
2c93382
3ba3633
2c93382
3ba3633
2c93382
3ba3633
 
 
 
 
2c93382
3ba3633
 
 
2c93382
 
3ba3633
 
 
 
2c93382
3ba3633
 
 
 
2c93382
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
title: UIDAI Project S.A.T.A.R.K
emoji: πŸš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Data-Driven Innovation for Aadhaar
---

# πŸ›‘οΈ Project S.A.T.A.R.K: AI-Powered Fraud Detection for UIDAI

[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://huggingface.co/spaces/lovnishverma/UIDAI)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

> **Context-Aware Anomaly Detection System for Aadhaar Enrolment Centers** > **Team ID:** UIDAI_4571 | **Theme:** Data-Driven Innovation for Aadhaar

---

## 🎯 Quick Links

- **πŸ“Š Live Analysis Notebook**: [Open in Google Colab](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)
- **πŸš€ Live Dashboard**: [Hugging Face Spaces](https://huggingface.co/spaces/lovnishverma/UIDAI)
- **πŸ“– Project Report**: [View PDF](Final-Project-Report.pdf)
- **πŸ’» Source Code**: Available in this repository

---

## 🎯 Overview

**Project S.A.T.A.R.K** (Statistical Anomaly Tracking & Aadhaar Risk Kit) is a revolutionary fraud detection system designed to solve the critical "Accuracy vs. Fairness" trade-off in Aadhaar vigilance.

### The Problem
India's demographic diversity makes global rules ineffective:
- ❌ **Strict Rules:** Flag legitimate activities in tribal belts (False Positives).
- ❌ **Lenient Rules:** Miss sophisticated fraud in metropolitan areas (False Negatives).

### Our Innovation: District Normalization
Instead of using a national average, S.A.T.A.R.K compares each enrolment center against its **local district baseline**.
- **Example:** In a tribal district where late enrolment is common (Avg: 40%), a center doing 90% is flagged. But in a city where 90% is normal, it is marked safe.

---

## ✨ Key Features

### 🧠 The "Context-Aware" AI Engine
- **Algorithm**: Isolation Forest (Unsupervised Learning)
- **Smart Logic**: Detects anomalies relative to local geography.
- **Capabilities**: Identifies "Ghost IDs", "Sunday Surges" (Illegal Camps), and "Mass Update Operations".

### πŸ“Š The Vigilance Dashboard
- **Geospatial Intelligence**: Interactive Heatmap of High-Risk Centers.
- **Actionable Insights**: "Priority Action List" exportable for field agents.
- **Evidence-Based**: Charts proving *why* a center was flagged (e.g., Weekend Activity vs. Weekday).

### πŸ“₯ Smart Data Ingestion
- **Automated**: Recursively fetches and merges fragmented CSV chunks.
- **Robust**: Handles massive datasets without data loss using Outer Joins.

---

## πŸš€ Quick Start

### **Option 1: Run Analysis (Google Colab)**
To see the Feature Engineering and Model Training in action:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1YAQ4nfxltvG_cts3fmGc_zi2JQc4oPOT?usp=sharing)

1. Open the Notebook.
2. Run all cells to process the raw data.
3. Download the generated `analyzed_aadhaar_data.csv`.

### **Option 2: Run Dashboard (Local)**

**Prerequisites:** Python 3.8+, pip

1. **Clone the repository**
   ```bash
   git clone [https://huggingface.co/spaces/lovnishverma/UIDAI](https://huggingface.co/spaces/lovnishverma/UIDAI)
   cd UIDAI

```

2. **Install dependencies**
```bash
pip install -r requirements.txt

```


3. **Launch the App**
```bash
streamlit run app.py

```


4. **Access the Dashboard**
Open `http://localhost:8501` in your browser.

---

## πŸ“ Project Structure

```
UIDAI/
β”œβ”€β”€ README.md                                 # This documentation
β”œβ”€β”€ requirements.txt                          # Python dependencies
β”œβ”€β”€ Dockerfile                                # Container configuration
β”œβ”€β”€ app.py                                    # Streamlit Dashboard Code
β”œβ”€β”€ UIDAI_4571_(PROJECT_S_A_T_A_R_K_AI).ipynb # Main Analysis Notebook
β”œβ”€β”€ analyzed_aadhaar_data.csv                 # Processed Data for Dashboard
β”œβ”€β”€ Final-Project-Report.pdf                  # Complete Project Documentation
└── assets/                                   # Images and logos

```

---

## 🧠 Technical Architecture

### The Pipeline

1. **Ingestion**: `SmartLoader` class merges fragmented CSVs.
2. **Context Engine**: Calculates `ratio_deviation` (Center vs. District).
3. **AI Model**: `IsolationForest` detects statistical outliers.
4. **Visualization**: Streamlit app renders the `RISK_SCORE` on maps.

### Core Risk Signals

| Feature | Logic | Detects |
| --- | --- | --- |
| **Ratio Deviation** | `(Center_Ratio - District_Avg)` | Ghost IDs |
| **Weekend Spike** | `Activity on Sunday / Normal Day` | Illegal Camps |
| **Mismatch Score** | ` | Bio - Demo |
| **Volume Anomaly** | `Total_Activity > 99th Percentile` | Mass Operations |

---

## πŸ“Š Dashboard Preview

### 1. Geographic Heatmap

Instantly spot high-risk clusters across India.
*(See `assets/` for screenshots)*

### 2. Priority Action List

Downloadable CSV for vigilance officers containing only the top 1% critical cases.

### 3. AI Insights Panel

"Why is this flagged?" - The AI explains its decision (e.g., *"Flagged due to 500% spike in weekend activity"*).

---

## πŸ‘₯ Team UIDAI_4571

**Team Leader:** Aman Choudhary (NIELIT Ropar)

**Team Member:** Prateek Dhar Dwivedi (NIELIT Ropar)

**Mentor:** Lovnish Verma (Project Engineer, NIELIT Ropar)

**Competition:** UIDAI Hackathon 2026

**Submission Date:** January 2026

---

## πŸ“ License

This project is open-source under the [MIT License](https://www.google.com/search?q=LICENSE).

---

<div align="center">
<strong>Project S.A.T.A.R.K.</strong>




<em>Statistical Anomaly Tracking & Aadhaar Risk Kit</em>




Built with ❀️ for a safer, inclusive Digital India.
</div>