Spaces:
Sleeping
Sleeping
| title: SANS Workshop Lab 13 | |
| emoji: π§ | |
| colorFrom: purple | |
| colorTo: red | |
| sdk: streamlit | |
| sdk_version: 1.42.0 | |
| app_file: app.py | |
| pinned: false | |
| # SEC545 Lab 13 β Model Inversion via Agent API | |
| **ML Security β Training Data Extraction & Membership Inference** | |
| Hands-on lab demonstrating how attackers extract sensitive data that was | |
| carelessly included in a fine-tuning corpus β using nothing but the model's | |
| public completion API. Covers two distinct statistical attacks (membership | |
| inference and training data extraction), canary-based forensic detection, | |
| and five layered defenses including differential privacy. | |
| ## What Students Will Do | |
| | Step | Topic | | |
| |------|-------| | |
| | 0 | Examine the training dataset β HR records, customer PII, medical data, and canary records that should never have been included | | |
| | 1 | Safe baseline: model answers support questions without leaking training data | | |
| | 2 | **Attack A** β Membership inference: use log-probability differentials to statistically confirm which records were in the training corpus | | |
| | 3 | **Attack B** β Training data extraction: prefix completion oracle recovers SSNs, credit card numbers, salaries, and diagnoses verbatim | | |
| | 4 | **Attack C** β Canary record extraction: synthetic sentinel records prove memorisation conclusively | | |
| | 5 | Apply five defenses: PII scrubbing before training, differential privacy (DP-SGD), output filtering, rate limiting, canary monitoring | | |
| | 6 | Run all three attack types against the fully hardened model | | |
| ## Secrets Required | |
| | Secret Name | Where to Get It | | |
| |-------------|----------------| | |
| | `OPENAI_API_KEY` | https://platform.openai.com/api-keys | | |
| Only one secret needed. | |
| ## Architecture | |
| - Training dataset of 10 records shown in full: 2 knowledge base (safe), 3 HR, 2 customer PII, 1 medical, 2 canaries | |
| - Membership inference simulated with realistic perplexity differentials and Gaussian noise | |
| - Extraction attacks use real `gpt-4o-mini` with a system prompt simulating a fine-tuned model | |
| - Differential privacy demo uses a configurable Ξ΅ slider to show the noise/utility tradeoff | |
| - Output filter applies regex PII patterns (SSN, CC, salary, PHI) before delivery | |
| - Rate limiter tracks per-session query count; hits at 15 queries with block + flag | |
| - Canary monitoring checks all completions for synthetic sentinel IDs | |
| ## Five Defenses Demonstrated | |
| 1. **PII scrubbing before training** β automated removal of sensitive records from corpus before fine-tuning; 8 of 10 records dropped | |
| 2. **Differential privacy (DP-SGD)** β Ξ΅ slider shows membership inference gap collapsing as noise increases; Opacus code sample | |
| 3. **Output filtering** β regex + pattern matching redacts SSN/CC/salary/PHI in completions; side-by-side raw vs filtered | |
| 4. **Rate limiting** β 15-query session limit with prefix-variation pattern detection | |
| 5. **Canary monitoring** β synthetic sentinel IDs watched across all outputs; any hit triggers P0 incident response | |
| ## Key Concepts | |
| **Membership inference** β even if the model refuses to complete a sensitive | |
| prefix, the perplexity differential between training members and non-members | |
| is a statistical signal that leaks membership. Confirming that a specific | |
| person's medical record was in training data is itself a HIPAA breach. | |
| **The canary principle** β a unique synthetic record with no existence outside | |
| the training corpus. If it appears in any completion, there is no alternative | |
| explanation. Forensic attribution is exact. | |
| ## Based On | |
| - Extracting Training Data from ChatGPT (Nasr et al., 2023) | |
| - Membership Inference Attacks Against Machine Learning Models (Shokri et al., 2017) | |
| - OWASP GenAI Security Project β Top 10 for Agentic Applications 2026 |