Fix GPU counts: 4 for Granite, 8 for DeepSeek/Qwen; correct effective batch sizes
Browse files
README.md
CHANGED
|
@@ -88,13 +88,13 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
|
|
| 88 |
| Precision | bfloat16 | bfloat16 |
|
| 89 |
| Seed | 42 | 42 |
|
| 90 |
|
| 91 |
-
**Effective batch sizes** (micro-batch × gradient accumulation steps ×
|
| 92 |
|
| 93 |
-
| Model | No-DP | DP ε=3
|
| 94 |
|---|---|---|---|
|
| 95 |
-
| Granite-4.0-H-Tiny |
|
| 96 |
-
| DeepSeek-Coder-6.7B |
|
| 97 |
-
| Qwen3-4B-Instruct |
|
| 98 |
|
| 99 |
### Differential Privacy
|
| 100 |
|
|
@@ -110,7 +110,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
|
|
| 110 |
|
| 111 |
### Infrastructure
|
| 112 |
|
| 113 |
-
- **GPUs:**
|
| 114 |
- **CUDA:** 13.0
|
| 115 |
- **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
|
| 116 |
|
|
|
|
| 88 |
| Precision | bfloat16 | bfloat16 |
|
| 89 |
| Seed | 42 | 42 |
|
| 90 |
|
| 91 |
+
**Effective batch sizes** (micro-batch × gradient accumulation steps × GPUs):
|
| 92 |
|
| 93 |
+
| Model | GPUs | No-DP | DP ε=3 / ε=8 |
|
| 94 |
|---|---|---|---|
|
| 95 |
+
| Granite-4.0-H-Tiny | 4 | 256 (8×8×4) | 512 (8×16×4) |
|
| 96 |
+
| DeepSeek-Coder-6.7B | 8 | 256 (8×4×8) | 512 (8×8×8) |
|
| 97 |
+
| Qwen3-4B-Instruct | 8 | 256 (8×4×8) | 512 (8×8×8) |
|
| 98 |
|
| 99 |
### Differential Privacy
|
| 100 |
|
|
|
|
| 110 |
|
| 111 |
### Infrastructure
|
| 112 |
|
| 113 |
+
- **GPUs:** NVIDIA H200 (140 GB VRAM each) — 4 GPUs for Granite, 8 GPUs for DeepSeek and Qwen
|
| 114 |
- **CUDA:** 13.0
|
| 115 |
- **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
|
| 116 |
|