Spaces:

YashChowdhary
/

Privacy_Preserving_Machine_Learning

Running

File size: 4,604 Bytes

---
title: Privacy Preserving Machine Learning
emoji: 🏆
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: mit
short_description: Project demonstrates privacy-preserving techniques for ML
---


# Privacy-Preserving Machine Learning Demo

## 🔒 Project Overview

This project demonstrates privacy-preserving techniques for machine learning on sensitive healthcare data. It implements:

- **SHA-256 Hashing** for direct identifiers (SSN)
- **Pseudonymization** for names
- **K-Anonymity Generalization** for DOB and income
- **Laplace Noise** (Differential Privacy) for numerical values
- **Differentially Private ML Training** using IBM's diffprivlib

## 📁 Files Included

| File | Description |
|------|-------------|
| `app.py` | Gradio web interface (main entry point for HF Spaces) |
| `privacy_ml_solution.py` | Core ML pipeline with all privacy techniques |
| `requirements.txt` | Python dependencies |
| `Assignment2Dataset-1_encrypted.csv` | The encrypted/anonymized dataset |
| `model_comparison_results.csv` | Performance metrics comparing models |
| `Privacy_Preserving_ML_Report.docx` | Comprehensive academic report |
| `Technical_Documentation.docx` | Code and library documentation |

---

## 🚀 Deploying to Hugging Face Spaces (Step-by-Step for Beginners)

### Step 1: Create a Hugging Face Account

1. Go to [huggingface.co](https://huggingface.co)
2. Click "Sign Up" in the top right
3. Fill in your details and verify your email

### Step 2: Create a New Space

1. Once logged in, click your profile picture → "New Space"
2. Fill in the form:
   - **Owner**: Select your username
   - **Space name**: e.g., `privacy-ml-demo`
   - **License**: MIT
   - **SDK**: Select **Gradio**
   - **Hardware**: Keep as "CPU Basic" (free)
3. Click "Create Space"

### Step 3: Upload Your Files

**Option A: Using the Web Interface (Easiest)**

1. In your new Space, click the "Files" tab
2. Click "Add file" → "Upload files"
3. Upload these files:
   - `app.py` (REQUIRED - this is the entry point)
   - `requirements.txt` (REQUIRED)
   - `Assignment2Dataset-1_encrypted.csv` (optional sample data)
4. Wait for the build to complete (~2-3 minutes)

**Option B: Using Git (For More Control)**

```bash
# Clone your space
git clone https://huggingface.co/spaces/YOUR_USERNAME/privacy-ml-demo
cd privacy-ml-demo

# Copy your files into the directory
cp /path/to/app.py .
cp /path/to/requirements.txt .

# Commit and push
git add .
git commit -m "Initial upload"
git push
```

### Step 4: Wait for Build

1. Click the "App" tab to see your space building
2. Watch the logs for any errors
3. Once complete, your app will be live!

### Step 5: Test Your App

1. Your app is now live at: `https://huggingface.co/spaces/YOUR_USERNAME/privacy-ml-demo`
2. Upload a CSV file and adjust the epsilon slider
3. Click "Run Privacy Analysis" to see results

---

## ⚙️ Local Testing (Before Deployment)

```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run the app
python app.py

# Open http://localhost:7860 in your browser
```

---

## 🔧 Troubleshooting

### "Build failed" Error
- Check that `requirements.txt` has correct package names
- View build logs for specific error messages

### "App crashed" Error
- Ensure `app.py` has `demo.launch()` at the bottom
- Check for syntax errors in your code

### Slow Loading
- Free tier spaces "sleep" after inactivity
- First load takes ~30 seconds to wake up

### Memory Issues
- Reduce `n_estimators` in RandomForest
- Use smaller test datasets

---

## 📊 Understanding the Results

| Metric | What it Means |
|--------|---------------|
| **Accuracy** | % of correct predictions |
| **F1 Score** | Balance of precision and recall |
| **Epsilon (ε)** | Privacy budget - lower = more privacy |

### Privacy Level Guide
- ε = 0.1-0.5: Very high privacy, some accuracy loss
- ε = 1.0: Balanced (recommended)
- ε = 5.0+: Lower privacy, minimal accuracy impact

---

## 📚 Learn More

- [Differential Privacy Explained](https://desfontain.es/privacy/differential-privacy-awesomeness.html)
- [IBM diffprivlib Documentation](https://diffprivlib.readthedocs.io/)
- [Gradio Documentation](https://gradio.app/docs/)
- [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)

---

## 📝 License

MIT License - Feel free to use and modify for your projects.


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference