Add per-category BoW shift validation and canary MIA benchmark link
Browse files
README.md
CHANGED
|
@@ -153,6 +153,57 @@ New-token canary audit (500 members, 500 non-members, 49-token random prefixes).
|
|
| 153 |
|
| 154 |
**Key finding:** DP training reduces canary audit AUC to near-random (0.5), with empirical ε dropping to 0 in most cases — confirming that the formal privacy guarantees hold in practice.
|
| 155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
## Repository Structure
|
| 157 |
|
| 158 |
```
|
|
@@ -191,4 +242,5 @@ Each variant directory contains:
|
|
| 191 |
## Related Resources
|
| 192 |
|
| 193 |
- **Training dataset:** [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt)
|
| 194 |
-
- **MIA benchmark:** [melihcatal/codedp-bench-mia-cpt](https://huggingface.co/datasets/melihcatal/codedp-bench-mia-cpt)
|
|
|
|
|
|
| 153 |
|
| 154 |
**Key finding:** DP training reduces canary audit AUC to near-random (0.5), with empirical ε dropping to 0 in most cases — confirming that the formal privacy guarantees hold in practice.
|
| 155 |
|
| 156 |
+
### MIA Benchmark Validation — BoW Distribution Shift
|
| 157 |
+
|
| 158 |
+
The canary MIA benchmark ([melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)) uses a targeted design where member and non-member samples share the same code prefix and differ only in the PII secret. A bag-of-words Random Forest classifier (5-fold CV) confirms no distribution shift:
|
| 159 |
+
|
| 160 |
+
| PII Type | BoW AUC | ± std | n |
|
| 161 |
+
|---|---|---|---|
|
| 162 |
+
| Overall | 0.099 | 0.018 | 400 |
|
| 163 |
+
| api_key | 0.033 | 0.047 | 80 |
|
| 164 |
+
| db_url | 0.311 | 0.105 | 80 |
|
| 165 |
+
| email | 0.078 | 0.099 | 80 |
|
| 166 |
+
| internal_ip | 0.028 | 0.021 | 80 |
|
| 167 |
+
| password | 0.055 | 0.048 | 80 |
|
| 168 |
+
|
| 169 |
+
All BoW AUC values are well below 0.5, confirming that MIA signal must come from the model's knowledge of the secret, not surface-level text features.
|
| 170 |
+
|
| 171 |
+
<details>
|
| 172 |
+
<summary>BoW shift test code</summary>
|
| 173 |
+
|
| 174 |
+
```python
|
| 175 |
+
from sklearn.ensemble import RandomForestClassifier
|
| 176 |
+
from sklearn.feature_extraction.text import CountVectorizer
|
| 177 |
+
from sklearn.model_selection import StratifiedKFold
|
| 178 |
+
from sklearn.metrics import roc_auc_score
|
| 179 |
+
import numpy as np, json
|
| 180 |
+
from datasets import load_dataset
|
| 181 |
+
|
| 182 |
+
ds = load_dataset("melihcatal/codedp-bench-canary-mia", split="train")
|
| 183 |
+
records = list(ds)
|
| 184 |
+
|
| 185 |
+
def bow_shift(texts, labels, n_folds=5):
|
| 186 |
+
X = CountVectorizer(max_features=5000, stop_words="english").fit_transform(texts)
|
| 187 |
+
y = np.array(labels)
|
| 188 |
+
aucs = []
|
| 189 |
+
for tr, te in StratifiedKFold(n_folds, shuffle=True, random_state=42).split(X, y):
|
| 190 |
+
clf = RandomForestClassifier(100, random_state=42, n_jobs=-1)
|
| 191 |
+
clf.fit(X[tr], y[tr])
|
| 192 |
+
aucs.append(roc_auc_score(y[te], clf.predict_proba(X[te])[:, 1]))
|
| 193 |
+
return np.mean(aucs), np.std(aucs)
|
| 194 |
+
|
| 195 |
+
# Overall
|
| 196 |
+
texts = [r["input"] for r in records]
|
| 197 |
+
labels = [r["label"] for r in records]
|
| 198 |
+
print("Overall:", bow_shift(texts, labels))
|
| 199 |
+
|
| 200 |
+
# Per PII category
|
| 201 |
+
for pii_type in sorted(set(r["pii_type"] for r in records)):
|
| 202 |
+
cat = [r for r in records if r["pii_type"] == pii_type]
|
| 203 |
+
print(f"{pii_type}:", bow_shift([r["input"] for r in cat], [r["label"] for r in cat]))
|
| 204 |
+
```
|
| 205 |
+
</details>
|
| 206 |
+
|
| 207 |
## Repository Structure
|
| 208 |
|
| 209 |
```
|
|
|
|
| 242 |
## Related Resources
|
| 243 |
|
| 244 |
- **Training dataset:** [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt)
|
| 245 |
+
- **MIA benchmark (general):** [melihcatal/codedp-bench-mia-cpt](https://huggingface.co/datasets/melihcatal/codedp-bench-mia-cpt)
|
| 246 |
+
- **MIA benchmark (canary):** [melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)
|