melihcatal
/

codedp-cpt-models

@@ -78,17 +78,24 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
 | Parameter | No-DP (base) | DP variants |
 |---|---|---|
 | Epochs | 2 | 2 |
-| Batch size (per GPU) | 8 | 8 |
 | Learning rate | 1e-4 | 2e-4 |
 | Optimizer | AdamW | AdamW |
 | LR scheduler | Cosine | Cosine |
 | Warmup ratio | 5% | 5% |
-| Grad accumulation steps | 4–8 | 16 |
 | Max gradient norm | 1.0 | 1.0 |
 | Sequence length | 1024 | 1024 |
 | Precision | bfloat16 | bfloat16 |
 | Seed | 42 | 42 |
 ### Differential Privacy
 | Parameter | Value |
@@ -103,8 +110,9 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
 ### Infrastructure
 - **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
-- **Hardware:** NVIDIA H200 GPUs
 ## Evaluation Results

 | Parameter | No-DP (base) | DP variants |
 |---|---|---|
 | Epochs | 2 | 2 |
+| Micro-batch size (per GPU) | 8 | 8 |
 | Learning rate | 1e-4 | 2e-4 |
 | Optimizer | AdamW | AdamW |
 | LR scheduler | Cosine | Cosine |
 | Warmup ratio | 5% | 5% |
 | Max gradient norm | 1.0 | 1.0 |
 | Sequence length | 1024 | 1024 |
 | Precision | bfloat16 | bfloat16 |
 | Seed | 42 | 42 |
+**Effective batch sizes** (micro-batch × gradient accumulation steps × 8 GPUs):
+| Model | No-DP | DP ε=3 | DP ε=8 |
+|---|---|---|---|
+| Granite-4.0-H-Tiny | 512 (8×8×8) | 1024 (8×16×8) | 1024 (8×16×8) |
+| DeepSeek-Coder-6.7B | 256 (8×4×8) | 512 (8×8×8) | 512 (8×8×8) |
+| Qwen3-4B-Instruct | 256 (8×4×8) | 512 (8×8×8) | 512 (8×8×8) |
 ### Differential Privacy
 | Parameter | Value |
 ### Infrastructure
+- **GPUs:** 8 × NVIDIA H200 (140 GB VRAM each)
+- **CUDA:** 13.0
 - **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
 ## Evaluation Results