Add per-category BoW shift validation and canary MIA benchmark link

81c827f verified 3 days ago

9.86 kB

	---
	language:
	- code
	license: apache-2.0
	tags:
	- differential-privacy
	- code-generation
	- continued-pretraining
	- lora
	- dp-sgd
	- opacus
	- privacy
	datasets:
	- melihcatal/codedp-cpt
	base_model:
	- ibm-granite/granite-4.0-h-tiny
	- bigcode/starcoder2-7b
	- Qwen/Qwen3-4B-Instruct-2507
	library_name: peft
	pipeline_tag: text-generation
	---

	# CodeDP-CPT: Differentially Private Continued Pre-Training for Code Models

	This repository contains LoRA adapters for code language models trained with Continued Pre-Training (CPT) under Differential Privacy (DP-SGD). The models demonstrate that formal privacy guarantees can be applied to code generation models while preserving utility.

	## Models

	Nine adapter checkpoints are provided — three base models × three privacy configurations:

	\| Base Model \| Variant \| DP \| Target ε \| Achieved ε \| Adapter Path \|
	\|---\|---\|---\|---\|---\|---\|
	\| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) \| base \| No \| — \| — \| `granite-4.0-h-tiny/base/adapter/` \|
	\| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) \| dp3 \| Yes \| 3.0 \| 2.99 \| `granite-4.0-h-tiny/dp3/adapter/` \|
	\| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) \| dp8 \| Yes \| 8.0 \| 8.00 \| `granite-4.0-h-tiny/dp8/adapter/` \|
	\| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) \| base \| No \| — \| — \| `starcoder2-7b/base/adapter/` \|
	\| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) \| dp3 \| Yes \| 3.0 \| 3.00 \| `starcoder2-7b/dp3/adapter/` \|
	\| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) \| dp8 \| Yes \| 8.0 \| 8.00 \| `starcoder2-7b/dp8/adapter/` \|
	\| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) \| base \| No \| — \| — \| `qwen3-4b-instruct/base/adapter/` \|
	\| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) \| dp3 \| Yes \| 3.0 \| 2.99 \| `qwen3-4b-instruct/dp3/adapter/` \|
	\| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) \| dp8 \| Yes \| 8.0 \| 8.00 \| `qwen3-4b-instruct/dp8/adapter/` \|

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base_model_name = "ibm-granite/granite-4.0-h-tiny"
	adapter_path = "melihcatal/codedp-cpt-models"
	subfolder = "granite-4.0-h-tiny/dp8/adapter"

	tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code=True)
	model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
	```

	## Training Details

	### Dataset

	- Dataset: [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt) — code mined from GitHub repositories with quality filtering and decontamination (file-level, Type-1, and Type-2 clone detection against evaluation benchmarks)
	- Mode: Causal language modeling (continued pre-training)
	- Validation split: 5% held out

	### LoRA Configuration

	\| Parameter \| Value \|
	\|---\|---\|
	\| Rank (r) \| 16 \|
	\| Alpha (α) \| 32 \|
	\| Dropout \| 0.05 \|
	\| Target modules \| q_proj, k_proj, v_proj, o_proj \|
	\| Modules to save \| lm_head \|

	### Training Hyperparameters

	\| Parameter \| No-DP (base) \| DP variants \|
	\|---\|---\|---\|
	\| Epochs \| 2 \| 2 \|
	\| Micro-batch size (per GPU) \| 8 \| 8 \|
	\| Learning rate \| 1e-4 \| 2e-4 \|
	\| Optimizer \| AdamW \| AdamW \|
	\| LR scheduler \| Cosine \| Cosine \|
	\| Warmup ratio \| 5% \| 5% \|
	\| Max gradient norm \| 1.0 \| 1.0 \|
	\| Sequence length \| 1024 \| 1024 \|
	\| Precision \| bfloat16 \| bfloat16 \|
	\| Seed \| 42 \| 42 \|

	Effective batch sizes (micro-batch × gradient accumulation steps × GPUs):

	\| Model \| GPUs \| No-DP \| DP ε=3 / ε=8 \|
	\|---\|---\|---\|---\|
	\| Granite-4.0-H-Tiny \| 4 \| 256 (8×8×4) \| 512 (8×16×4) \|
	\| StarCoder2-7B \| 4 \| 256 (8×8×4) \| 512 (8×16×4) \|
	\| Qwen3-4B-Instruct \| 8 \| 256 (8×4×8) \| 512 (8×8×8) \|

	### Differential Privacy

	\| Parameter \| Value \|
	\|---\|---\|
	\| Engine \| Opacus PrivacyEngine \|
	\| Mechanism \| Gaussian (DP-SGD) \|
	\| Per-sample gradients \| Hook-based \|
	\| Clipping \| Flat (global) \|
	\| Target δ \| 1e-5 \|
	\| Target ε \| 3.0 or 8.0 \|
	\| Privacy accounting \| RDP (Rényi Differential Privacy) \|

	### Infrastructure

	- GPUs: NVIDIA H200 (140 GB VRAM each) — 4 GPUs for Granite and StarCoder2, 8 GPUs for Qwen
	- CUDA: 13.0
	- Distributed strategy: DDP (Distributed Data Parallel) with NCCL backend

	## Evaluation Results

	### Functional Correctness — CodeDP-FC (Granite-4.0-H-Tiny)

	103 code generation tasks, 10 samples per task, temperature 0.8.

	\| Variant \| pass@1 \| pass@5 \| pass@10 \|
	\|---\|---\|---\|---\|
	\| No fine-tuning \| 13.5% \| 18.4% \| 20.4% \|
	\| CPT (no DP) \| 10.1% \| 16.6% \| 18.4% \|
	\| CPT + DP (ε=3) \| 13.7% \| 19.1% \| 21.4% \|
	\| CPT + DP (ε=8) \| 14.5% \| 21.1% \| 23.3% \|

	### Training Loss (Eval Set)

	\| Model \| No-DP \| DP ε=3 \| DP ε=8 \|
	\|---\|---\|---\|---\|
	\| Granite-4.0-H-Tiny \| 0.946 \| 1.044 \| 1.038 \|
	\| StarCoder2-7B \| 0.745 \| 0.843 \| 0.841 \|
	\| Qwen3-4B-Instruct \| 0.808 \| 0.941 \| 0.925 \|

	### Privacy Audit

	New-token canary audit (500 members, 500 non-members, 49-token random prefixes). Higher AUC = more memorization; lower = better privacy.

	\| Model \| Variant \| Loss AUC \| Embedding AUC \| Empirical ε (p=0.01) \|
	\|---\|---\|---\|---\|---\|
	\| Granite-4.0-H-Tiny \| base \| 1.000 \| 1.000 \| 3.02 \|
	\| Granite-4.0-H-Tiny \| dp3 \| 0.543 \| 0.513 \| 0.00 \|
	\| Granite-4.0-H-Tiny \| dp8 \| 0.564 \| 0.508 \| 0.16 \|
	\| StarCoder2-7B \| base \| 1.000 \| 0.916 \| 3.02 \|
	\| StarCoder2-7B \| dp3 \| 0.526 \| 0.521 \| 0.00 \|
	\| StarCoder2-7B \| dp8 \| 0.520 \| 0.523 \| 0.00 \|
	\| Qwen3-4B-Instruct \| base \| 0.969 \| 0.884 \| 3.02 \|
	\| Qwen3-4B-Instruct \| dp3 \| 0.505 \| 0.515 \| 0.00 \|
	\| Qwen3-4B-Instruct \| dp8 \| 0.515 \| 0.516 \| 0.00 \|

	Key finding: DP training reduces canary audit AUC to near-random (0.5), with empirical ε dropping to 0 in most cases — confirming that the formal privacy guarantees hold in practice.

	### MIA Benchmark Validation — BoW Distribution Shift

	The canary MIA benchmark ([melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)) uses a targeted design where member and non-member samples share the same code prefix and differ only in the PII secret. A bag-of-words Random Forest classifier (5-fold CV) confirms no distribution shift:

	\| PII Type \| BoW AUC \| ± std \| n \|
	\|---\|---\|---\|---\|
	\| Overall \| 0.099 \| 0.018 \| 400 \|
	\| api_key \| 0.033 \| 0.047 \| 80 \|
	\| db_url \| 0.311 \| 0.105 \| 80 \|
	\| email \| 0.078 \| 0.099 \| 80 \|
	\| internal_ip \| 0.028 \| 0.021 \| 80 \|
	\| password \| 0.055 \| 0.048 \| 80 \|

	All BoW AUC values are well below 0.5, confirming that MIA signal must come from the model's knowledge of the secret, not surface-level text features.

	<details>
	<summary>BoW shift test code</summary>

	```python
	from sklearn.ensemble import RandomForestClassifier
	from sklearn.feature_extraction.text import CountVectorizer
	from sklearn.model_selection import StratifiedKFold
	from sklearn.metrics import roc_auc_score
	import numpy as np, json
	from datasets import load_dataset

	ds = load_dataset("melihcatal/codedp-bench-canary-mia", split="train")
	records = list(ds)

	def bow_shift(texts, labels, n_folds=5):
	X = CountVectorizer(max_features=5000, stop_words="english").fit_transform(texts)
	y = np.array(labels)
	aucs = []
	for tr, te in StratifiedKFold(n_folds, shuffle=True, random_state=42).split(X, y):
	clf = RandomForestClassifier(100, random_state=42, n_jobs=-1)
	clf.fit(X[tr], y[tr])
	aucs.append(roc_auc_score(y[te], clf.predict_proba(X[te])[:, 1]))
	return np.mean(aucs), np.std(aucs)

	# Overall
	texts = [r["input"] for r in records]
	labels = [r["label"] for r in records]
	print("Overall:", bow_shift(texts, labels))

	# Per PII category
	for pii_type in sorted(set(r["pii_type"] for r in records)):
	cat = [r for r in records if r["pii_type"] == pii_type]
	print(f"{pii_type}:", bow_shift([r["input"] for r in cat], [r["label"] for r in cat]))
	```
	</details>

	## Repository Structure

	```
	├── granite-4.0-h-tiny/
	│ ├── base/ # No-DP baseline
	│ ├── dp3/ # DP ε=3
	│ └── dp8/ # DP ε=8
	├── starcoder2-7b/
	│ ├── base/
	│ ├── dp3/
	│ └── dp8/
	└── qwen3-4b-instruct/
	├── base/
	├── dp3/
	└── dp8/
	```

	Each variant directory contains:
	- `adapter/` — LoRA adapter weights (PEFT-compatible)
	- `tokenizer/` — Tokenizer with any added audit tokens
	- `resolved_config.yaml` — Full training configuration
	- `summary.json` — Training and audit metrics
	- `audit_results.json`, `audit_scores.npz` — Privacy audit artifacts
	- `metrics.jsonl`, `scalars.csv` — Training logs
	- `tensorboard/` — TensorBoard events
	- `codecarbon.csv` — Carbon emissions tracking
	- `epochs/` — Per-epoch checkpoints and audit results

	## Limitations

	- These are LoRA adapters, not standalone models. They require the corresponding base model for inference.
	- The adapters include additional tokenizer tokens added during the privacy audit process (canary tokens). These do not affect normal generation.
	- Evaluation results are on the CodeDP-FC benchmark; performance may vary on other code generation tasks.
	- DP training with tight privacy budgets (ε=3) incurs a utility cost, particularly visible in validation loss.

	## Related Resources

	- Training dataset: [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt)
	- MIA benchmark (general): [melihcatal/codedp-bench-mia-cpt](https://huggingface.co/datasets/melihcatal/codedp-bench-mia-cpt)
	- MIA benchmark (canary): [melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)