melihcatal commited on
Commit
81c827f
·
verified ·
1 Parent(s): 3379db3

Add per-category BoW shift validation and canary MIA benchmark link

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -153,6 +153,57 @@ New-token canary audit (500 members, 500 non-members, 49-token random prefixes).
153
 
154
  **Key finding:** DP training reduces canary audit AUC to near-random (0.5), with empirical ε dropping to 0 in most cases — confirming that the formal privacy guarantees hold in practice.
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
  ## Repository Structure
157
 
158
  ```
@@ -191,4 +242,5 @@ Each variant directory contains:
191
  ## Related Resources
192
 
193
  - **Training dataset:** [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt)
194
- - **MIA benchmark:** [melihcatal/codedp-bench-mia-cpt](https://huggingface.co/datasets/melihcatal/codedp-bench-mia-cpt)
 
 
153
 
154
  **Key finding:** DP training reduces canary audit AUC to near-random (0.5), with empirical ε dropping to 0 in most cases — confirming that the formal privacy guarantees hold in practice.
155
 
156
+ ### MIA Benchmark Validation — BoW Distribution Shift
157
+
158
+ The canary MIA benchmark ([melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)) uses a targeted design where member and non-member samples share the same code prefix and differ only in the PII secret. A bag-of-words Random Forest classifier (5-fold CV) confirms no distribution shift:
159
+
160
+ | PII Type | BoW AUC | ± std | n |
161
+ |---|---|---|---|
162
+ | Overall | 0.099 | 0.018 | 400 |
163
+ | api_key | 0.033 | 0.047 | 80 |
164
+ | db_url | 0.311 | 0.105 | 80 |
165
+ | email | 0.078 | 0.099 | 80 |
166
+ | internal_ip | 0.028 | 0.021 | 80 |
167
+ | password | 0.055 | 0.048 | 80 |
168
+
169
+ All BoW AUC values are well below 0.5, confirming that MIA signal must come from the model's knowledge of the secret, not surface-level text features.
170
+
171
+ <details>
172
+ <summary>BoW shift test code</summary>
173
+
174
+ ```python
175
+ from sklearn.ensemble import RandomForestClassifier
176
+ from sklearn.feature_extraction.text import CountVectorizer
177
+ from sklearn.model_selection import StratifiedKFold
178
+ from sklearn.metrics import roc_auc_score
179
+ import numpy as np, json
180
+ from datasets import load_dataset
181
+
182
+ ds = load_dataset("melihcatal/codedp-bench-canary-mia", split="train")
183
+ records = list(ds)
184
+
185
+ def bow_shift(texts, labels, n_folds=5):
186
+ X = CountVectorizer(max_features=5000, stop_words="english").fit_transform(texts)
187
+ y = np.array(labels)
188
+ aucs = []
189
+ for tr, te in StratifiedKFold(n_folds, shuffle=True, random_state=42).split(X, y):
190
+ clf = RandomForestClassifier(100, random_state=42, n_jobs=-1)
191
+ clf.fit(X[tr], y[tr])
192
+ aucs.append(roc_auc_score(y[te], clf.predict_proba(X[te])[:, 1]))
193
+ return np.mean(aucs), np.std(aucs)
194
+
195
+ # Overall
196
+ texts = [r["input"] for r in records]
197
+ labels = [r["label"] for r in records]
198
+ print("Overall:", bow_shift(texts, labels))
199
+
200
+ # Per PII category
201
+ for pii_type in sorted(set(r["pii_type"] for r in records)):
202
+ cat = [r for r in records if r["pii_type"] == pii_type]
203
+ print(f"{pii_type}:", bow_shift([r["input"] for r in cat], [r["label"] for r in cat]))
204
+ ```
205
+ </details>
206
+
207
  ## Repository Structure
208
 
209
  ```
 
242
  ## Related Resources
243
 
244
  - **Training dataset:** [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt)
245
+ - **MIA benchmark (general):** [melihcatal/codedp-bench-mia-cpt](https://huggingface.co/datasets/melihcatal/codedp-bench-mia-cpt)
246
+ - **MIA benchmark (canary):** [melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)