codedp-cpt-models / README.md
melihcatal's picture
Add per-category BoW shift validation and canary MIA benchmark link
81c827f verified
---
language:
- code
license: apache-2.0
tags:
- differential-privacy
- code-generation
- continued-pretraining
- lora
- dp-sgd
- opacus
- privacy
datasets:
- melihcatal/codedp-cpt
base_model:
- ibm-granite/granite-4.0-h-tiny
- bigcode/starcoder2-7b
- Qwen/Qwen3-4B-Instruct-2507
library_name: peft
pipeline_tag: text-generation
---
# CodeDP-CPT: Differentially Private Continued Pre-Training for Code Models
This repository contains LoRA adapters for code language models trained with **Continued Pre-Training (CPT)** under **Differential Privacy (DP-SGD)**. The models demonstrate that formal privacy guarantees can be applied to code generation models while preserving utility.
## Models
Nine adapter checkpoints are provided β€” three base models Γ— three privacy configurations:
| Base Model | Variant | DP | Target Ξ΅ | Achieved Ξ΅ | Adapter Path |
|---|---|---|---|---|---|
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | base | No | β€” | β€” | `granite-4.0-h-tiny/base/adapter/` |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp3 | Yes | 3.0 | 2.99 | `granite-4.0-h-tiny/dp3/adapter/` |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp8 | Yes | 8.0 | 8.00 | `granite-4.0-h-tiny/dp8/adapter/` |
| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | base | No | β€” | β€” | `starcoder2-7b/base/adapter/` |
| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | dp3 | Yes | 3.0 | 3.00 | `starcoder2-7b/dp3/adapter/` |
| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | dp8 | Yes | 8.0 | 8.00 | `starcoder2-7b/dp8/adapter/` |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | base | No | β€” | β€” | `qwen3-4b-instruct/base/adapter/` |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp3 | Yes | 3.0 | 2.99 | `qwen3-4b-instruct/dp3/adapter/` |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp8 | Yes | 8.0 | 8.00 | `qwen3-4b-instruct/dp8/adapter/` |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_name = "ibm-granite/granite-4.0-h-tiny"
adapter_path = "melihcatal/codedp-cpt-models"
subfolder = "granite-4.0-h-tiny/dp8/adapter"
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code=True)
model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
```
## Training Details
### Dataset
- **Dataset:** [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt) β€” code mined from GitHub repositories with quality filtering and decontamination (file-level, Type-1, and Type-2 clone detection against evaluation benchmarks)
- **Mode:** Causal language modeling (continued pre-training)
- **Validation split:** 5% held out
### LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha (Ξ±) | 32 |
| Dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Modules to save | lm_head |
### Training Hyperparameters
| Parameter | No-DP (base) | DP variants |
|---|---|---|
| Epochs | 2 | 2 |
| Micro-batch size (per GPU) | 8 | 8 |
| Learning rate | 1e-4 | 2e-4 |
| Optimizer | AdamW | AdamW |
| LR scheduler | Cosine | Cosine |
| Warmup ratio | 5% | 5% |
| Max gradient norm | 1.0 | 1.0 |
| Sequence length | 1024 | 1024 |
| Precision | bfloat16 | bfloat16 |
| Seed | 42 | 42 |
**Effective batch sizes** (micro-batch Γ— gradient accumulation steps Γ— GPUs):
| Model | GPUs | No-DP | DP Ξ΅=3 / Ξ΅=8 |
|---|---|---|---|
| Granite-4.0-H-Tiny | 4 | 256 (8Γ—8Γ—4) | 512 (8Γ—16Γ—4) |
| StarCoder2-7B | 4 | 256 (8Γ—8Γ—4) | 512 (8Γ—16Γ—4) |
| Qwen3-4B-Instruct | 8 | 256 (8Γ—4Γ—8) | 512 (8Γ—8Γ—8) |
### Differential Privacy
| Parameter | Value |
|---|---|
| Engine | Opacus PrivacyEngine |
| Mechanism | Gaussian (DP-SGD) |
| Per-sample gradients | Hook-based |
| Clipping | Flat (global) |
| Target Ξ΄ | 1e-5 |
| Target Ξ΅ | 3.0 or 8.0 |
| Privacy accounting | RDP (RΓ©nyi Differential Privacy) |
### Infrastructure
- **GPUs:** NVIDIA H200 (140 GB VRAM each) β€” 4 GPUs for Granite and StarCoder2, 8 GPUs for Qwen
- **CUDA:** 13.0
- **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
## Evaluation Results
### Functional Correctness β€” CodeDP-FC (Granite-4.0-H-Tiny)
103 code generation tasks, 10 samples per task, temperature 0.8.
| Variant | pass@1 | pass@5 | pass@10 |
|---|---|---|---|
| No fine-tuning | 13.5% | 18.4% | 20.4% |
| CPT (no DP) | 10.1% | 16.6% | 18.4% |
| CPT + DP (Ξ΅=3) | 13.7% | 19.1% | 21.4% |
| CPT + DP (Ξ΅=8) | **14.5%** | **21.1%** | **23.3%** |
### Training Loss (Eval Set)
| Model | No-DP | DP Ξ΅=3 | DP Ξ΅=8 |
|---|---|---|---|
| Granite-4.0-H-Tiny | 0.946 | 1.044 | 1.038 |
| StarCoder2-7B | 0.745 | 0.843 | 0.841 |
| Qwen3-4B-Instruct | 0.808 | 0.941 | 0.925 |
### Privacy Audit
New-token canary audit (500 members, 500 non-members, 49-token random prefixes). Higher AUC = more memorization; lower = better privacy.
| Model | Variant | Loss AUC | Embedding AUC | Empirical Ξ΅ (p=0.01) |
|---|---|---|---|---|
| Granite-4.0-H-Tiny | base | 1.000 | 1.000 | 3.02 |
| Granite-4.0-H-Tiny | dp3 | 0.543 | 0.513 | 0.00 |
| Granite-4.0-H-Tiny | dp8 | 0.564 | 0.508 | 0.16 |
| StarCoder2-7B | base | 1.000 | 0.916 | 3.02 |
| StarCoder2-7B | dp3 | 0.526 | 0.521 | 0.00 |
| StarCoder2-7B | dp8 | 0.520 | 0.523 | 0.00 |
| Qwen3-4B-Instruct | base | 0.969 | 0.884 | 3.02 |
| Qwen3-4B-Instruct | dp3 | 0.505 | 0.515 | 0.00 |
| Qwen3-4B-Instruct | dp8 | 0.515 | 0.516 | 0.00 |
**Key finding:** DP training reduces canary audit AUC to near-random (0.5), with empirical Ξ΅ dropping to 0 in most cases β€” confirming that the formal privacy guarantees hold in practice.
### MIA Benchmark Validation β€” BoW Distribution Shift
The canary MIA benchmark ([melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)) uses a targeted design where member and non-member samples share the same code prefix and differ only in the PII secret. A bag-of-words Random Forest classifier (5-fold CV) confirms no distribution shift:
| PII Type | BoW AUC | Β± std | n |
|---|---|---|---|
| Overall | 0.099 | 0.018 | 400 |
| api_key | 0.033 | 0.047 | 80 |
| db_url | 0.311 | 0.105 | 80 |
| email | 0.078 | 0.099 | 80 |
| internal_ip | 0.028 | 0.021 | 80 |
| password | 0.055 | 0.048 | 80 |
All BoW AUC values are well below 0.5, confirming that MIA signal must come from the model's knowledge of the secret, not surface-level text features.
<details>
<summary>BoW shift test code</summary>
```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import numpy as np, json
from datasets import load_dataset
ds = load_dataset("melihcatal/codedp-bench-canary-mia", split="train")
records = list(ds)
def bow_shift(texts, labels, n_folds=5):
X = CountVectorizer(max_features=5000, stop_words="english").fit_transform(texts)
y = np.array(labels)
aucs = []
for tr, te in StratifiedKFold(n_folds, shuffle=True, random_state=42).split(X, y):
clf = RandomForestClassifier(100, random_state=42, n_jobs=-1)
clf.fit(X[tr], y[tr])
aucs.append(roc_auc_score(y[te], clf.predict_proba(X[te])[:, 1]))
return np.mean(aucs), np.std(aucs)
# Overall
texts = [r["input"] for r in records]
labels = [r["label"] for r in records]
print("Overall:", bow_shift(texts, labels))
# Per PII category
for pii_type in sorted(set(r["pii_type"] for r in records)):
cat = [r for r in records if r["pii_type"] == pii_type]
print(f"{pii_type}:", bow_shift([r["input"] for r in cat], [r["label"] for r in cat]))
```
</details>
## Repository Structure
```
β”œβ”€β”€ granite-4.0-h-tiny/
β”‚ β”œβ”€β”€ base/ # No-DP baseline
β”‚ β”œβ”€β”€ dp3/ # DP Ξ΅=3
β”‚ └── dp8/ # DP Ξ΅=8
β”œβ”€β”€ starcoder2-7b/
β”‚ β”œβ”€β”€ base/
β”‚ β”œβ”€β”€ dp3/
β”‚ └── dp8/
└── qwen3-4b-instruct/
β”œβ”€β”€ base/
β”œβ”€β”€ dp3/
└── dp8/
```
Each variant directory contains:
- `adapter/` β€” LoRA adapter weights (PEFT-compatible)
- `tokenizer/` β€” Tokenizer with any added audit tokens
- `resolved_config.yaml` β€” Full training configuration
- `summary.json` β€” Training and audit metrics
- `audit_results.json`, `audit_scores.npz` β€” Privacy audit artifacts
- `metrics.jsonl`, `scalars.csv` β€” Training logs
- `tensorboard/` β€” TensorBoard events
- `codecarbon.csv` β€” Carbon emissions tracking
- `epochs/` β€” Per-epoch checkpoints and audit results
## Limitations
- These are **LoRA adapters**, not standalone models. They require the corresponding base model for inference.
- The adapters include additional tokenizer tokens added during the privacy audit process (canary tokens). These do not affect normal generation.
- Evaluation results are on the CodeDP-FC benchmark; performance may vary on other code generation tasks.
- DP training with tight privacy budgets (Ξ΅=3) incurs a utility cost, particularly visible in validation loss.
## Related Resources
- **Training dataset:** [melihcatal/codedp-cpt](https://huggingface.co/datasets/melihcatal/codedp-cpt)
- **MIA benchmark (general):** [melihcatal/codedp-bench-mia-cpt](https://huggingface.co/datasets/melihcatal/codedp-bench-mia-cpt)
- **MIA benchmark (canary):** [melihcatal/codedp-bench-canary-mia](https://huggingface.co/datasets/melihcatal/codedp-bench-canary-mia)