| --- |
| title: Privacy Preserving Machine Learning |
| emoji: π |
| colorFrom: gray |
| colorTo: blue |
| sdk: gradio |
| sdk_version: 6.9.0 |
| app_file: app.py |
| pinned: false |
| license: mit |
| short_description: Project demonstrates privacy-preserving techniques for ML |
| --- |
| |
|
|
| # Privacy-Preserving Machine Learning Demo |
|
|
| ## π Project Overview |
|
|
| This project demonstrates privacy-preserving techniques for machine learning on sensitive healthcare data. It implements: |
|
|
| - **SHA-256 Hashing** for direct identifiers (SSN) |
| - **Pseudonymization** for names |
| - **K-Anonymity Generalization** for DOB and income |
| - **Laplace Noise** (Differential Privacy) for numerical values |
| - **Differentially Private ML Training** using IBM's diffprivlib |
|
|
| ## π Files Included |
|
|
| | File | Description | |
| |------|-------------| |
| | `app.py` | Gradio web interface (main entry point for HF Spaces) | |
| | `privacy_ml_solution.py` | Core ML pipeline with all privacy techniques | |
| | `requirements.txt` | Python dependencies | |
| | `Assignment2Dataset-1_encrypted.csv` | The encrypted/anonymized dataset | |
| | `model_comparison_results.csv` | Performance metrics comparing models | |
| | `Privacy_Preserving_ML_Report.docx` | Comprehensive academic report | |
| | `Technical_Documentation.docx` | Code and library documentation | |
|
|
| --- |
|
|
| ## π Deploying to Hugging Face Spaces (Step-by-Step for Beginners) |
|
|
| ### Step 1: Create a Hugging Face Account |
|
|
| 1. Go to [huggingface.co](https://huggingface.co) |
| 2. Click "Sign Up" in the top right |
| 3. Fill in your details and verify your email |
|
|
| ### Step 2: Create a New Space |
|
|
| 1. Once logged in, click your profile picture β "New Space" |
| 2. Fill in the form: |
| - **Owner**: Select your username |
| - **Space name**: e.g., `privacy-ml-demo` |
| - **License**: MIT |
| - **SDK**: Select **Gradio** |
| - **Hardware**: Keep as "CPU Basic" (free) |
| 3. Click "Create Space" |
|
|
| ### Step 3: Upload Your Files |
|
|
| **Option A: Using the Web Interface (Easiest)** |
|
|
| 1. In your new Space, click the "Files" tab |
| 2. Click "Add file" β "Upload files" |
| 3. Upload these files: |
| - `app.py` (REQUIRED - this is the entry point) |
| - `requirements.txt` (REQUIRED) |
| - `Assignment2Dataset-1_encrypted.csv` (optional sample data) |
| 4. Wait for the build to complete (~2-3 minutes) |
|
|
| **Option B: Using Git (For More Control)** |
|
|
| ```bash |
| # Clone your space |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/privacy-ml-demo |
| cd privacy-ml-demo |
| |
| # Copy your files into the directory |
| cp /path/to/app.py . |
| cp /path/to/requirements.txt . |
| |
| # Commit and push |
| git add . |
| git commit -m "Initial upload" |
| git push |
| ``` |
|
|
| ### Step 4: Wait for Build |
|
|
| 1. Click the "App" tab to see your space building |
| 2. Watch the logs for any errors |
| 3. Once complete, your app will be live! |
|
|
| ### Step 5: Test Your App |
|
|
| 1. Your app is now live at: `https://huggingface.co/spaces/YOUR_USERNAME/privacy-ml-demo` |
| 2. Upload a CSV file and adjust the epsilon slider |
| 3. Click "Run Privacy Analysis" to see results |
|
|
| --- |
|
|
| ## βοΈ Local Testing (Before Deployment) |
|
|
| ```bash |
| # Create virtual environment |
| python -m venv venv |
| source venv/bin/activate # On Windows: venv\Scripts\activate |
| |
| # Install dependencies |
| pip install -r requirements.txt |
| |
| # Run the app |
| python app.py |
| |
| # Open http://localhost:7860 in your browser |
| ``` |
|
|
| --- |
|
|
| ## π§ Troubleshooting |
|
|
| ### "Build failed" Error |
| - Check that `requirements.txt` has correct package names |
| - View build logs for specific error messages |
|
|
| ### "App crashed" Error |
| - Ensure `app.py` has `demo.launch()` at the bottom |
| - Check for syntax errors in your code |
|
|
| ### Slow Loading |
| - Free tier spaces "sleep" after inactivity |
| - First load takes ~30 seconds to wake up |
|
|
| ### Memory Issues |
| - Reduce `n_estimators` in RandomForest |
| - Use smaller test datasets |
|
|
| --- |
|
|
| ## π Understanding the Results |
|
|
| | Metric | What it Means | |
| |--------|---------------| |
| | **Accuracy** | % of correct predictions | |
| | **F1 Score** | Balance of precision and recall | |
| | **Epsilon (Ξ΅)** | Privacy budget - lower = more privacy | |
|
|
| ### Privacy Level Guide |
| - Ξ΅ = 0.1-0.5: Very high privacy, some accuracy loss |
| - Ξ΅ = 1.0: Balanced (recommended) |
| - Ξ΅ = 5.0+: Lower privacy, minimal accuracy impact |
|
|
| --- |
|
|
| ## π Learn More |
|
|
| - [Differential Privacy Explained](https://desfontain.es/privacy/differential-privacy-awesomeness.html) |
| - [IBM diffprivlib Documentation](https://diffprivlib.readthedocs.io/) |
| - [Gradio Documentation](https://gradio.app/docs/) |
| - [Hugging Face Spaces Guide](https://huggingface.co/docs/hub/spaces) |
|
|
| --- |
|
|
| ## π License |
|
|
| MIT License - Feel free to use and modify for your projects. |
|
|
|
|
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|