melihcatal commited on
Commit
3379db3
Β·
verified Β·
1 Parent(s): 8b6a415

Update README: replace DeepSeek with StarCoder2-7B

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -14,7 +14,7 @@ datasets:
14
  - melihcatal/codedp-cpt
15
  base_model:
16
  - ibm-granite/granite-4.0-h-tiny
17
- - deepseek-ai/deepseek-coder-6.7b-instruct
18
  - Qwen/Qwen3-4B-Instruct-2507
19
  library_name: peft
20
  pipeline_tag: text-generation
@@ -33,9 +33,9 @@ Nine adapter checkpoints are provided β€” three base models Γ— three privacy con
33
  | [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | base | No | β€” | β€” | `granite-4.0-h-tiny/base/adapter/` |
34
  | [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp3 | Yes | 3.0 | 2.99 | `granite-4.0-h-tiny/dp3/adapter/` |
35
  | [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp8 | Yes | 8.0 | 8.00 | `granite-4.0-h-tiny/dp8/adapter/` |
36
- | [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) | base | No | β€” | β€” | `deepseek-coder-6.7b/base/adapter/` |
37
- | [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) | dp3 | Yes | 3.0 | 3.00 | `deepseek-coder-6.7b/dp3/adapter/` |
38
- | [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) | dp8 | Yes | 8.0 | 8.00 | `deepseek-coder-6.7b/dp8/adapter/` |
39
  | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | base | No | β€” | β€” | `qwen3-4b-instruct/base/adapter/` |
40
  | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp3 | Yes | 3.0 | 2.99 | `qwen3-4b-instruct/dp3/adapter/` |
41
  | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp8 | Yes | 8.0 | 8.00 | `qwen3-4b-instruct/dp8/adapter/` |
@@ -93,7 +93,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
93
  | Model | GPUs | No-DP | DP Ξ΅=3 / Ξ΅=8 |
94
  |---|---|---|---|
95
  | Granite-4.0-H-Tiny | 4 | 256 (8Γ—8Γ—4) | 512 (8Γ—16Γ—4) |
96
- | DeepSeek-Coder-6.7B | 8 | 256 (8Γ—4Γ—8) | 512 (8Γ—8Γ—8) |
97
  | Qwen3-4B-Instruct | 8 | 256 (8Γ—4Γ—8) | 512 (8Γ—8Γ—8) |
98
 
99
  ### Differential Privacy
@@ -110,7 +110,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
110
 
111
  ### Infrastructure
112
 
113
- - **GPUs:** NVIDIA H200 (140 GB VRAM each) β€” 4 GPUs for Granite, 8 GPUs for DeepSeek and Qwen
114
  - **CUDA:** 13.0
115
  - **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
116
 
@@ -132,7 +132,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
132
  | Model | No-DP | DP Ξ΅=3 | DP Ξ΅=8 |
133
  |---|---|---|---|
134
  | Granite-4.0-H-Tiny | 0.946 | 1.044 | 1.038 |
135
- | DeepSeek-Coder-6.7B | 4.840 | 10.326 | 7.523 |
136
  | Qwen3-4B-Instruct | 0.808 | 0.941 | 0.925 |
137
 
138
  ### Privacy Audit
@@ -144,9 +144,9 @@ New-token canary audit (500 members, 500 non-members, 49-token random prefixes).
144
  | Granite-4.0-H-Tiny | base | 1.000 | 1.000 | 3.02 |
145
  | Granite-4.0-H-Tiny | dp3 | 0.543 | 0.513 | 0.00 |
146
  | Granite-4.0-H-Tiny | dp8 | 0.564 | 0.508 | 0.16 |
147
- | DeepSeek-Coder-6.7B | base | 0.957 | 0.968 | 3.02 |
148
- | DeepSeek-Coder-6.7B | dp3 | 0.522 | 0.543 | 0.00 |
149
- | DeepSeek-Coder-6.7B | dp8 | 0.533 | 0.545 | 0.00 |
150
  | Qwen3-4B-Instruct | base | 0.969 | 0.884 | 3.02 |
151
  | Qwen3-4B-Instruct | dp3 | 0.505 | 0.515 | 0.00 |
152
  | Qwen3-4B-Instruct | dp8 | 0.515 | 0.516 | 0.00 |
@@ -160,7 +160,7 @@ New-token canary audit (500 members, 500 non-members, 49-token random prefixes).
160
  β”‚ β”œβ”€β”€ base/ # No-DP baseline
161
  β”‚ β”œβ”€β”€ dp3/ # DP Ξ΅=3
162
  β”‚ └── dp8/ # DP Ξ΅=8
163
- β”œβ”€β”€ deepseek-coder-6.7b/
164
  β”‚ β”œβ”€β”€ base/
165
  β”‚ β”œβ”€β”€ dp3/
166
  β”‚ └── dp8/
 
14
  - melihcatal/codedp-cpt
15
  base_model:
16
  - ibm-granite/granite-4.0-h-tiny
17
+ - bigcode/starcoder2-7b
18
  - Qwen/Qwen3-4B-Instruct-2507
19
  library_name: peft
20
  pipeline_tag: text-generation
 
33
  | [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | base | No | β€” | β€” | `granite-4.0-h-tiny/base/adapter/` |
34
  | [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp3 | Yes | 3.0 | 2.99 | `granite-4.0-h-tiny/dp3/adapter/` |
35
  | [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp8 | Yes | 8.0 | 8.00 | `granite-4.0-h-tiny/dp8/adapter/` |
36
+ | [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | base | No | β€” | β€” | `starcoder2-7b/base/adapter/` |
37
+ | [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | dp3 | Yes | 3.0 | 3.00 | `starcoder2-7b/dp3/adapter/` |
38
+ | [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | dp8 | Yes | 8.0 | 8.00 | `starcoder2-7b/dp8/adapter/` |
39
  | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | base | No | β€” | β€” | `qwen3-4b-instruct/base/adapter/` |
40
  | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp3 | Yes | 3.0 | 2.99 | `qwen3-4b-instruct/dp3/adapter/` |
41
  | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp8 | Yes | 8.0 | 8.00 | `qwen3-4b-instruct/dp8/adapter/` |
 
93
  | Model | GPUs | No-DP | DP Ξ΅=3 / Ξ΅=8 |
94
  |---|---|---|---|
95
  | Granite-4.0-H-Tiny | 4 | 256 (8Γ—8Γ—4) | 512 (8Γ—16Γ—4) |
96
+ | StarCoder2-7B | 4 | 256 (8Γ—8Γ—4) | 512 (8Γ—16Γ—4) |
97
  | Qwen3-4B-Instruct | 8 | 256 (8Γ—4Γ—8) | 512 (8Γ—8Γ—8) |
98
 
99
  ### Differential Privacy
 
110
 
111
  ### Infrastructure
112
 
113
+ - **GPUs:** NVIDIA H200 (140 GB VRAM each) β€” 4 GPUs for Granite and StarCoder2, 8 GPUs for Qwen
114
  - **CUDA:** 13.0
115
  - **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
116
 
 
132
  | Model | No-DP | DP Ξ΅=3 | DP Ξ΅=8 |
133
  |---|---|---|---|
134
  | Granite-4.0-H-Tiny | 0.946 | 1.044 | 1.038 |
135
+ | StarCoder2-7B | 0.745 | 0.843 | 0.841 |
136
  | Qwen3-4B-Instruct | 0.808 | 0.941 | 0.925 |
137
 
138
  ### Privacy Audit
 
144
  | Granite-4.0-H-Tiny | base | 1.000 | 1.000 | 3.02 |
145
  | Granite-4.0-H-Tiny | dp3 | 0.543 | 0.513 | 0.00 |
146
  | Granite-4.0-H-Tiny | dp8 | 0.564 | 0.508 | 0.16 |
147
+ | StarCoder2-7B | base | 1.000 | 0.916 | 3.02 |
148
+ | StarCoder2-7B | dp3 | 0.526 | 0.521 | 0.00 |
149
+ | StarCoder2-7B | dp8 | 0.520 | 0.523 | 0.00 |
150
  | Qwen3-4B-Instruct | base | 0.969 | 0.884 | 3.02 |
151
  | Qwen3-4B-Instruct | dp3 | 0.505 | 0.515 | 0.00 |
152
  | Qwen3-4B-Instruct | dp8 | 0.515 | 0.516 | 0.00 |
 
160
  β”‚ β”œβ”€β”€ base/ # No-DP baseline
161
  β”‚ β”œβ”€β”€ dp3/ # DP Ξ΅=3
162
  β”‚ └── dp8/ # DP Ξ΅=8
163
+ β”œβ”€β”€ starcoder2-7b/
164
  β”‚ β”œβ”€β”€ base/
165
  β”‚ β”œβ”€β”€ dp3/
166
  β”‚ └── dp8/