File size: 4,604 Bytes
0551c02
 
 
 
 
 
 
 
 
 
 
 
 
74e84f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0551c02
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
title: Privacy Preserving Machine Learning
emoji: πŸ†
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: mit
short_description: Project demonstrates privacy-preserving techniques for ML
---


# Privacy-Preserving Machine Learning Demo

## πŸ”’ Project Overview

This project demonstrates privacy-preserving techniques for machine learning on sensitive healthcare data. It implements:

- **SHA-256 Hashing** for direct identifiers (SSN)
- **Pseudonymization** for names
- **K-Anonymity Generalization** for DOB and income
- **Laplace Noise** (Differential Privacy) for numerical values
- **Differentially Private ML Training** using IBM's diffprivlib

## πŸ“ Files Included

| File | Description |
|------|-------------|
| `app.py` | Gradio web interface (main entry point for HF Spaces) |
| `privacy_ml_solution.py` | Core ML pipeline with all privacy techniques |
| `requirements.txt` | Python dependencies |
| `Assignment2Dataset-1_encrypted.csv` | The encrypted/anonymized dataset |
| `model_comparison_results.csv` | Performance metrics comparing models |
| `Privacy_Preserving_ML_Report.docx` | Comprehensive academic report |
| `Technical_Documentation.docx` | Code and library documentation |

---

## πŸš€ Deploying to Hugging Face Spaces (Step-by-Step for Beginners)

### Step 1: Create a Hugging Face Account

1. Go to [huggingface.co](https://huggingface.co)
2. Click "Sign Up" in the top right
3. Fill in your details and verify your email

### Step 2: Create a New Space

1. Once logged in, click your profile picture β†’ "New Space"
2. Fill in the form:
   - **Owner**: Select your username
   - **Space name**: e.g., `privacy-ml-demo`
   - **License**: MIT
   - **SDK**: Select **Gradio**
   - **Hardware**: Keep as "CPU Basic" (free)
3. Click "Create Space"

### Step 3: Upload Your Files

**Option A: Using the Web Interface (Easiest)**

1. In your new Space, click the "Files" tab
2. Click "Add file" β†’ "Upload files"
3. Upload these files:
   - `app.py` (REQUIRED - this is the entry point)
   - `requirements.txt` (REQUIRED)
   - `Assignment2Dataset-1_encrypted.csv` (optional sample data)
4. Wait for the build to complete (~2-3 minutes)

**Option B: Using Git (For More Control)**

```bash
# Clone your space
git clone https://huggingface.co/spaces/YOUR_USERNAME/privacy-ml-demo
cd privacy-ml-demo

# Copy your files into the directory
cp /path/to/app.py .
cp /path/to/requirements.txt .

# Commit and push
git add .
git commit -m "Initial upload"
git push
```

### Step 4: Wait for Build

1. Click the "App" tab to see your space building
2. Watch the logs for any errors
3. Once complete, your app will be live!

### Step 5: Test Your App

1. Your app is now live at: `https://huggingface.co/spaces/YOUR_USERNAME/privacy-ml-demo`
2. Upload a CSV file and adjust the epsilon slider
3. Click "Run Privacy Analysis" to see results

---

## βš™οΈ Local Testing (Before Deployment)

```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the app
python app.py

# Open http://localhost:7860 in your browser
```

---

## πŸ”§ Troubleshooting

### "Build failed" Error
- Check that `requirements.txt` has correct package names
- View build logs for specific error messages

### "App crashed" Error
- Ensure `app.py` has `demo.launch()` at the bottom
- Check for syntax errors in your code

### Slow Loading
- Free tier spaces "sleep" after inactivity
- First load takes ~30 seconds to wake up

### Memory Issues
- Reduce `n_estimators` in RandomForest
- Use smaller test datasets

---

## πŸ“Š Understanding the Results

| Metric | What it Means |
|--------|---------------|
| **Accuracy** | % of correct predictions |
| **F1 Score** | Balance of precision and recall |
| **Epsilon (Ξ΅)** | Privacy budget - lower = more privacy |

### Privacy Level Guide
- Ξ΅ = 0.1-0.5: Very high privacy, some accuracy loss
- Ξ΅ = 1.0: Balanced (recommended)
- Ξ΅ = 5.0+: Lower privacy, minimal accuracy impact

---

## πŸ“š Learn More

- [Differential Privacy Explained](https://desfontain.es/privacy/differential-privacy-awesomeness.html)
- [IBM diffprivlib Documentation](https://diffprivlib.readthedocs.io/)
- [Gradio Documentation](https://gradio.app/docs/)
- [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)

---

## πŸ“ License

MIT License - Feel free to use and modify for your projects.


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference