Update README: replace DeepSeek with StarCoder2-7B
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ datasets:
|
|
| 14 |
- melihcatal/codedp-cpt
|
| 15 |
base_model:
|
| 16 |
- ibm-granite/granite-4.0-h-tiny
|
| 17 |
-
-
|
| 18 |
- Qwen/Qwen3-4B-Instruct-2507
|
| 19 |
library_name: peft
|
| 20 |
pipeline_tag: text-generation
|
|
@@ -33,9 +33,9 @@ Nine adapter checkpoints are provided β three base models Γ three privacy con
|
|
| 33 |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | base | No | β | β | `granite-4.0-h-tiny/base/adapter/` |
|
| 34 |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp3 | Yes | 3.0 | 2.99 | `granite-4.0-h-tiny/dp3/adapter/` |
|
| 35 |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp8 | Yes | 8.0 | 8.00 | `granite-4.0-h-tiny/dp8/adapter/` |
|
| 36 |
-
| [
|
| 37 |
-
| [
|
| 38 |
-
| [
|
| 39 |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | base | No | β | β | `qwen3-4b-instruct/base/adapter/` |
|
| 40 |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp3 | Yes | 3.0 | 2.99 | `qwen3-4b-instruct/dp3/adapter/` |
|
| 41 |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp8 | Yes | 8.0 | 8.00 | `qwen3-4b-instruct/dp8/adapter/` |
|
|
@@ -93,7 +93,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
|
|
| 93 |
| Model | GPUs | No-DP | DP Ξ΅=3 / Ξ΅=8 |
|
| 94 |
|---|---|---|---|
|
| 95 |
| Granite-4.0-H-Tiny | 4 | 256 (8Γ8Γ4) | 512 (8Γ16Γ4) |
|
| 96 |
-
|
|
| 97 |
| Qwen3-4B-Instruct | 8 | 256 (8Γ4Γ8) | 512 (8Γ8Γ8) |
|
| 98 |
|
| 99 |
### Differential Privacy
|
|
@@ -110,7 +110,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
|
|
| 110 |
|
| 111 |
### Infrastructure
|
| 112 |
|
| 113 |
-
- **GPUs:** NVIDIA H200 (140 GB VRAM each) β 4 GPUs for Granite, 8 GPUs for
|
| 114 |
- **CUDA:** 13.0
|
| 115 |
- **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
|
| 116 |
|
|
@@ -132,7 +132,7 @@ model = PeftModel.from_pretrained(model, adapter_path, subfolder=subfolder)
|
|
| 132 |
| Model | No-DP | DP Ξ΅=3 | DP Ξ΅=8 |
|
| 133 |
|---|---|---|---|
|
| 134 |
| Granite-4.0-H-Tiny | 0.946 | 1.044 | 1.038 |
|
| 135 |
-
|
|
| 136 |
| Qwen3-4B-Instruct | 0.808 | 0.941 | 0.925 |
|
| 137 |
|
| 138 |
### Privacy Audit
|
|
@@ -144,9 +144,9 @@ New-token canary audit (500 members, 500 non-members, 49-token random prefixes).
|
|
| 144 |
| Granite-4.0-H-Tiny | base | 1.000 | 1.000 | 3.02 |
|
| 145 |
| Granite-4.0-H-Tiny | dp3 | 0.543 | 0.513 | 0.00 |
|
| 146 |
| Granite-4.0-H-Tiny | dp8 | 0.564 | 0.508 | 0.16 |
|
| 147 |
-
|
|
| 148 |
-
|
|
| 149 |
-
|
|
| 150 |
| Qwen3-4B-Instruct | base | 0.969 | 0.884 | 3.02 |
|
| 151 |
| Qwen3-4B-Instruct | dp3 | 0.505 | 0.515 | 0.00 |
|
| 152 |
| Qwen3-4B-Instruct | dp8 | 0.515 | 0.516 | 0.00 |
|
|
@@ -160,7 +160,7 @@ New-token canary audit (500 members, 500 non-members, 49-token random prefixes).
|
|
| 160 |
β βββ base/ # No-DP baseline
|
| 161 |
β βββ dp3/ # DP Ξ΅=3
|
| 162 |
β βββ dp8/ # DP Ξ΅=8
|
| 163 |
-
βββ
|
| 164 |
β βββ base/
|
| 165 |
β βββ dp3/
|
| 166 |
β βββ dp8/
|
|
|
|
| 14 |
- melihcatal/codedp-cpt
|
| 15 |
base_model:
|
| 16 |
- ibm-granite/granite-4.0-h-tiny
|
| 17 |
+
- bigcode/starcoder2-7b
|
| 18 |
- Qwen/Qwen3-4B-Instruct-2507
|
| 19 |
library_name: peft
|
| 20 |
pipeline_tag: text-generation
|
|
|
|
| 33 |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | base | No | β | β | `granite-4.0-h-tiny/base/adapter/` |
|
| 34 |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp3 | Yes | 3.0 | 2.99 | `granite-4.0-h-tiny/dp3/adapter/` |
|
| 35 |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | dp8 | Yes | 8.0 | 8.00 | `granite-4.0-h-tiny/dp8/adapter/` |
|
| 36 |
+
| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | base | No | β | β | `starcoder2-7b/base/adapter/` |
|
| 37 |
+
| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | dp3 | Yes | 3.0 | 3.00 | `starcoder2-7b/dp3/adapter/` |
|
| 38 |
+
| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | dp8 | Yes | 8.0 | 8.00 | `starcoder2-7b/dp8/adapter/` |
|
| 39 |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | base | No | β | β | `qwen3-4b-instruct/base/adapter/` |
|
| 40 |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp3 | Yes | 3.0 | 2.99 | `qwen3-4b-instruct/dp3/adapter/` |
|
| 41 |
| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | dp8 | Yes | 8.0 | 8.00 | `qwen3-4b-instruct/dp8/adapter/` |
|
|
|
|
| 93 |
| Model | GPUs | No-DP | DP Ξ΅=3 / Ξ΅=8 |
|
| 94 |
|---|---|---|---|
|
| 95 |
| Granite-4.0-H-Tiny | 4 | 256 (8Γ8Γ4) | 512 (8Γ16Γ4) |
|
| 96 |
+
| StarCoder2-7B | 4 | 256 (8Γ8Γ4) | 512 (8Γ16Γ4) |
|
| 97 |
| Qwen3-4B-Instruct | 8 | 256 (8Γ4Γ8) | 512 (8Γ8Γ8) |
|
| 98 |
|
| 99 |
### Differential Privacy
|
|
|
|
| 110 |
|
| 111 |
### Infrastructure
|
| 112 |
|
| 113 |
+
- **GPUs:** NVIDIA H200 (140 GB VRAM each) β 4 GPUs for Granite and StarCoder2, 8 GPUs for Qwen
|
| 114 |
- **CUDA:** 13.0
|
| 115 |
- **Distributed strategy:** DDP (Distributed Data Parallel) with NCCL backend
|
| 116 |
|
|
|
|
| 132 |
| Model | No-DP | DP Ξ΅=3 | DP Ξ΅=8 |
|
| 133 |
|---|---|---|---|
|
| 134 |
| Granite-4.0-H-Tiny | 0.946 | 1.044 | 1.038 |
|
| 135 |
+
| StarCoder2-7B | 0.745 | 0.843 | 0.841 |
|
| 136 |
| Qwen3-4B-Instruct | 0.808 | 0.941 | 0.925 |
|
| 137 |
|
| 138 |
### Privacy Audit
|
|
|
|
| 144 |
| Granite-4.0-H-Tiny | base | 1.000 | 1.000 | 3.02 |
|
| 145 |
| Granite-4.0-H-Tiny | dp3 | 0.543 | 0.513 | 0.00 |
|
| 146 |
| Granite-4.0-H-Tiny | dp8 | 0.564 | 0.508 | 0.16 |
|
| 147 |
+
| StarCoder2-7B | base | 1.000 | 0.916 | 3.02 |
|
| 148 |
+
| StarCoder2-7B | dp3 | 0.526 | 0.521 | 0.00 |
|
| 149 |
+
| StarCoder2-7B | dp8 | 0.520 | 0.523 | 0.00 |
|
| 150 |
| Qwen3-4B-Instruct | base | 0.969 | 0.884 | 3.02 |
|
| 151 |
| Qwen3-4B-Instruct | dp3 | 0.505 | 0.515 | 0.00 |
|
| 152 |
| Qwen3-4B-Instruct | dp8 | 0.515 | 0.516 | 0.00 |
|
|
|
|
| 160 |
β βββ base/ # No-DP baseline
|
| 161 |
β βββ dp3/ # DP Ξ΅=3
|
| 162 |
β βββ dp8/ # DP Ξ΅=8
|
| 163 |
+
βββ starcoder2-7b/
|
| 164 |
β βββ base/
|
| 165 |
β βββ dp3/
|
| 166 |
β βββ dp8/
|