Spaces:

YashChowdhary
/

Privacy_Preserving_Machine_Learning

Running

App Files Files Community

Privacy_Preserving_Machine_Learning / README.md

YashChowdhary

Update README.md

74e84f3 verified 10 days ago

preview code

raw

history blame contribute delete

4.6 kB

	---
	title: Privacy Preserving Machine Learning
	emoji: 🏆
	colorFrom: gray
	colorTo: blue
	sdk: gradio
	sdk_version: 6.9.0
	app_file: app.py
	pinned: false
	license: mit
	short_description: Project demonstrates privacy-preserving techniques for ML
	---


	# Privacy-Preserving Machine Learning Demo

	## 🔒 Project Overview

	This project demonstrates privacy-preserving techniques for machine learning on sensitive healthcare data. It implements:

	- SHA-256 Hashing for direct identifiers (SSN)
	- Pseudonymization for names
	- K-Anonymity Generalization for DOB and income
	- Laplace Noise (Differential Privacy) for numerical values
	- Differentially Private ML Training using IBM's diffprivlib

	## 📁 Files Included

	\| File \| Description \|
	\|------\|-------------\|
	\| `app.py` \| Gradio web interface (main entry point for HF Spaces) \|
	\| `privacy_ml_solution.py` \| Core ML pipeline with all privacy techniques \|
	\| `requirements.txt` \| Python dependencies \|
	\| `Assignment2Dataset-1_encrypted.csv` \| The encrypted/anonymized dataset \|
	\| `model_comparison_results.csv` \| Performance metrics comparing models \|
	\| `Privacy_Preserving_ML_Report.docx` \| Comprehensive academic report \|
	\| `Technical_Documentation.docx` \| Code and library documentation \|

	---

	## 🚀 Deploying to Hugging Face Spaces (Step-by-Step for Beginners)

	### Step 1: Create a Hugging Face Account

	1. Go to [huggingface.co](https://huggingface.co)
	2. Click "Sign Up" in the top right
	3. Fill in your details and verify your email

	### Step 2: Create a New Space

	1. Once logged in, click your profile picture → "New Space"
	2. Fill in the form:
	- Owner: Select your username
	- Space name: e.g., `privacy-ml-demo`
	- License: MIT
	- SDK: Select Gradio
	- Hardware: Keep as "CPU Basic" (free)
	3. Click "Create Space"

	### Step 3: Upload Your Files

	Option A: Using the Web Interface (Easiest)

	1. In your new Space, click the "Files" tab
	2. Click "Add file" → "Upload files"
	3. Upload these files:
	- `app.py` (REQUIRED - this is the entry point)
	- `requirements.txt` (REQUIRED)
	- `Assignment2Dataset-1_encrypted.csv` (optional sample data)
	4. Wait for the build to complete (~2-3 minutes)

	Option B: Using Git (For More Control)

	```bash
	# Clone your space
	git clone https://huggingface.co/spaces/YOUR_USERNAME/privacy-ml-demo
	cd privacy-ml-demo

	# Copy your files into the directory
	cp /path/to/app.py .
	cp /path/to/requirements.txt .

	# Commit and push
	git add .
	git commit -m "Initial upload"
	git push
	```

	### Step 4: Wait for Build

	1. Click the "App" tab to see your space building
	2. Watch the logs for any errors
	3. Once complete, your app will be live!

	### Step 5: Test Your App

	1. Your app is now live at: `https://huggingface.co/spaces/YOUR_USERNAME/privacy-ml-demo`
	2. Upload a CSV file and adjust the epsilon slider
	3. Click "Run Privacy Analysis" to see results

	---

	## ⚙️ Local Testing (Before Deployment)

	```bash
	# Create virtual environment
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt

	# Run the app
	python app.py

	# Open http://localhost:7860 in your browser
	```

	---

	## 🔧 Troubleshooting

	### "Build failed" Error
	- Check that `requirements.txt` has correct package names
	- View build logs for specific error messages

	### "App crashed" Error
	- Ensure `app.py` has `demo.launch()` at the bottom
	- Check for syntax errors in your code

	### Slow Loading
	- Free tier spaces "sleep" after inactivity
	- First load takes ~30 seconds to wake up

	### Memory Issues
	- Reduce `n_estimators` in RandomForest
	- Use smaller test datasets

	---

	## 📊 Understanding the Results

	\| Metric \| What it Means \|
	\|--------\|---------------\|
	\| Accuracy \| % of correct predictions \|
	\| F1 Score \| Balance of precision and recall \|
	\| Epsilon (ε) \| Privacy budget - lower = more privacy \|

	### Privacy Level Guide
	- ε = 0.1-0.5: Very high privacy, some accuracy loss
	- ε = 1.0: Balanced (recommended)
	- ε = 5.0+: Lower privacy, minimal accuracy impact

	---

	## 📚 Learn More

	- [Differential Privacy Explained](https://desfontain.es/privacy/differential-privacy-awesomeness.html)
	- [IBM diffprivlib Documentation](https://diffprivlib.readthedocs.io/)
	- [Gradio Documentation](https://gradio.app/docs/)
	- [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces)

	---

	## 📝 License

	MIT License - Feel free to use and modify for your projects.


	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference