melihcatal commited on
Commit
38acd3e
·
verified ·
1 Parent(s): f374d1e

Fix GPU counts: 4 for Granite, 8 for DeepSeek/Qwen; correct effective batch sizes

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -88,13 +88,13 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
88
  | Precision | bfloat16 | bfloat16 |
89
  | Seed | 42 | 42 |
90
 
91
- **Effective batch sizes** (micro-batch × gradient accumulation steps × 8 GPUs):
92
 
93
- | Model | No-DP | DP ε=3 | DP ε=8 |
94
  |---|---|---|---|
95
- | Granite-4.0-H-Tiny | 512 (8×8×8) | 1024 (8×16×8) | 1024 (8×16×8) |
96
- | DeepSeek-Coder-6.7B | 256 (8×4×8) | 512 (8×8×8) | 512 (8×8×8) |
97
- | Qwen3-4B-Instruct | 256 (8×4×8) | 512 (8×8×8) | 512 (8×8×8) |
98
 
99
  ### Differential Privacy
100
 
@@ -110,7 +110,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
110
 
111
  ### Infrastructure
112
 
113
- - **GPUs:** 8 × NVIDIA H200 (140 GB VRAM each)
114
  - **CUDA:** 13.0
115
  - **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
116
 
 
88
  | Precision | bfloat16 | bfloat16 |
89
  | Seed | 42 | 42 |
90
 
91
+ **Effective batch sizes** (micro-batch × gradient accumulation steps × GPUs):
92
 
93
+ | Model | GPUs | No-DP | DP ε=3 / ε=8 |
94
  |---|---|---|---|
95
+ | Granite-4.0-H-Tiny | 4 | 256 (8×8×4) | 512 (8×16×4) |
96
+ | DeepSeek-Coder-6.7B | 8 | 256 (8×4×8) | 512 (8×8×8) |
97
+ | Qwen3-4B-Instruct | 8 | 256 (8×4×8) | 512 (8×8×8) |
98
 
99
  ### Differential Privacy
100
 
 
110
 
111
  ### Infrastructure
112
 
113
+ - **GPUs:** NVIDIA H200 (140 GB VRAM each) — 4 GPUs for Granite, 8 GPUs for DeepSeek and Qwen
114
  - **CUDA:** 13.0
115
  - **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
116