Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.54.0
title: SANS Workshop Lab 13
emoji: π§
colorFrom: purple
colorTo: red
sdk: streamlit
sdk_version: 1.42.0
app_file: app.py
pinned: false
SEC545 Lab 13 β Model Inversion via Agent API
ML Security β Training Data Extraction & Membership Inference
Hands-on lab demonstrating how attackers extract sensitive data that was carelessly included in a fine-tuning corpus β using nothing but the model's public completion API. Covers two distinct statistical attacks (membership inference and training data extraction), canary-based forensic detection, and five layered defenses including differential privacy.
What Students Will Do
| Step | Topic |
|---|---|
| 0 | Examine the training dataset β HR records, customer PII, medical data, and canary records that should never have been included |
| 1 | Safe baseline: model answers support questions without leaking training data |
| 2 | Attack A β Membership inference: use log-probability differentials to statistically confirm which records were in the training corpus |
| 3 | Attack B β Training data extraction: prefix completion oracle recovers SSNs, credit card numbers, salaries, and diagnoses verbatim |
| 4 | Attack C β Canary record extraction: synthetic sentinel records prove memorisation conclusively |
| 5 | Apply five defenses: PII scrubbing before training, differential privacy (DP-SGD), output filtering, rate limiting, canary monitoring |
| 6 | Run all three attack types against the fully hardened model |
Secrets Required
| Secret Name | Where to Get It |
|---|---|
OPENAI_API_KEY |
https://platform.openai.com/api-keys |
Only one secret needed.
Architecture
- Training dataset of 10 records shown in full: 2 knowledge base (safe), 3 HR, 2 customer PII, 1 medical, 2 canaries
- Membership inference simulated with realistic perplexity differentials and Gaussian noise
- Extraction attacks use real
gpt-4o-miniwith a system prompt simulating a fine-tuned model - Differential privacy demo uses a configurable Ξ΅ slider to show the noise/utility tradeoff
- Output filter applies regex PII patterns (SSN, CC, salary, PHI) before delivery
- Rate limiter tracks per-session query count; hits at 15 queries with block + flag
- Canary monitoring checks all completions for synthetic sentinel IDs
Five Defenses Demonstrated
- PII scrubbing before training β automated removal of sensitive records from corpus before fine-tuning; 8 of 10 records dropped
- Differential privacy (DP-SGD) β Ξ΅ slider shows membership inference gap collapsing as noise increases; Opacus code sample
- Output filtering β regex + pattern matching redacts SSN/CC/salary/PHI in completions; side-by-side raw vs filtered
- Rate limiting β 15-query session limit with prefix-variation pattern detection
- Canary monitoring β synthetic sentinel IDs watched across all outputs; any hit triggers P0 incident response
Key Concepts
Membership inference β even if the model refuses to complete a sensitive prefix, the perplexity differential between training members and non-members is a statistical signal that leaks membership. Confirming that a specific person's medical record was in training data is itself a HIPAA breach.
The canary principle β a unique synthetic record with no existence outside the training corpus. If it appears in any completion, there is no alternative explanation. Forensic attribution is exact.
Based On
- Extracting Training Data from ChatGPT (Nasr et al., 2023)
- Membership Inference Attacks Against Machine Learning Models (Shokri et al., 2017)
- OWASP GenAI Security Project β Top 10 for Agentic Applications 2026