melihcatal
/

codedp-cpt-models

@@ -153,6 +153,57 @@ New-token canary audit (500 members, 500 non-members, 49-token random prefixes).
 **Key finding:** DP training reduces canary audit AUC to near-random (0.5), with empirical ε dropping to 0 in most cases — confirming that the formal privacy guarantees hold in practice.
 ## Repository Structure
 ```
@@ -191,4 +242,5 @@ Each variant directory contains:
 ## Related Resources
 - **Training dataset:** [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt)
-- **MIA benchmark:** [melihcatal/codedp-bench-mia-cpt](https://huggingface.co/datasets/melihcatal/codedp-bench-mia-cpt)

 **Key finding:** DP training reduces canary audit AUC to near-random (0.5), with empirical ε dropping to 0 in most cases — confirming that the formal privacy guarantees hold in practice.
+### MIA Benchmark Validation — BoW Distribution Shift
+The canary MIA benchmark ([melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)) uses a targeted design where member and non-member samples share the same code prefix and differ only in the PII secret. A bag-of-words Random Forest classifier (5-fold CV) confirms no distribution shift:
+| PII Type | BoW AUC | ± std | n |
+|---|---|---|---|
+| Overall | 0.099 | 0.018 | 400 |
+| api_key | 0.033 | 0.047 | 80 |
+| db_url | 0.311 | 0.105 | 80 |
+| email | 0.078 | 0.099 | 80 |
+| internal_ip | 0.028 | 0.021 | 80 |
+| password | 0.055 | 0.048 | 80 |
+All BoW AUC values are well below 0.5, confirming that MIA signal must come from the model's knowledge of the secret, not surface-level text features.
+<details>
+<summary>BoW shift test code</summary>
+```python
+from sklearn.ensemble import RandomForestClassifier
+from sklearn.feature_extraction.text import CountVectorizer
+from sklearn.model_selection import StratifiedKFold
+from sklearn.metrics import roc_auc_score
+import numpy as np, json
+from datasets import load_dataset
+ds = load_dataset("melihcatal/codedp-bench-canary-mia", split="train")
+records = list(ds)
+def bow_shift(texts, labels, n_folds=5):
+    X = CountVectorizer(max_features=5000, stop_words="english").fit_transform(texts)
+    y = np.array(labels)
+    aucs = []
+    for tr, te in StratifiedKFold(n_folds, shuffle=True, random_state=42).split(X, y):
+        clf = RandomForestClassifier(100, random_state=42, n_jobs=-1)
+        clf.fit(X[tr], y[tr])
+        aucs.append(roc_auc_score(y[te], clf.predict_proba(X[te])[:, 1]))
+    return np.mean(aucs), np.std(aucs)
+# Overall
+texts = [r["input"] for r in records]
+labels = [r["label"] for r in records]
+print("Overall:", bow_shift(texts, labels))
+# Per PII category
+for pii_type in sorted(set(r["pii_type"] for r in records)):
+    cat = [r for r in records if r["pii_type"] == pii_type]
+    print(f"{pii_type}:", bow_shift([r["input"] for r in cat], [r["label"] for r in cat]))
+```
+</details>
 ## Repository Structure
 ```
 ## Related Resources
 - **Training dataset:** [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt)
+- **MIA benchmark (general):** [melihcatal/codedp-bench-mia-cpt](https://huggingface.co/datasets/melihcatal/codedp-bench-mia-cpt)
+- **MIA benchmark (canary):** [melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)