melihcatal
/

codedp-cpt-models

@@ -88,13 +88,13 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
 | Precision | bfloat16 | bfloat16 |
 | Seed | 42 | 42 |
-**Effective batch sizes** (micro-batch × gradient accumulation steps × 8 GPUs):
-| Model | No-DP | DP ε=3 | DP ε=8 |
 |---|---|---|---|
-| Granite-4.0-H-Tiny | 512 (8×8×8) | 1024 (8×16×8) | 1024 (8×16×8) |
-| DeepSeek-Coder-6.7B | 256 (8×4×8) | 512 (8×8×8) | 512 (8×8×8) |
-| Qwen3-4B-Instruct | 256 (8×4×8) | 512 (8×8×8) | 512 (8×8×8) |
 ### Differential Privacy
@@ -110,7 +110,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
 ### Infrastructure
-- **GPUs:** 8 × NVIDIA H200 (140 GB VRAM each)
 - **CUDA:** 13.0
 - **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend

 | Precision | bfloat16 | bfloat16 |
 | Seed | 42 | 42 |
+**Effective batch sizes** (micro-batch × gradient accumulation steps × GPUs):
+| Model | GPUs | No-DP | DP ε=3 / ε=8 |
 |---|---|---|---|
+| Granite-4.0-H-Tiny | 4 | 256 (8×8×4) | 512 (8×16×4) |
+| DeepSeek-Coder-6.7B | 8 | 256 (8×4×8) | 512 (8×8×8) |
+| Qwen3-4B-Instruct | 8 | 256 (8×4×8) | 512 (8×8×8) |
 ### Differential Privacy
 ### Infrastructure
+- **GPUs:** NVIDIA H200 (140 GB VRAM each) — 4 GPUs for Granite, 8 GPUs for DeepSeek and Qwen
 - **CUDA:** 13.0
 - **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend