Update README.md
Browse files
README.md
CHANGED
|
@@ -92,7 +92,7 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
|
|
| 92 |
|
| 93 |
## Training (Editing) Details
|
| 94 |
|
| 95 |
-
###
|
| 96 |
We use the pairwise toxicity preference dataset introduced by [Lee et al. (2024)](https://arxiv.org/abs/2401.01967).
|
| 97 |
|
| 98 |
- Non-toxic sequences: sampled from WikiText-2.
|
|
@@ -117,7 +117,7 @@ No preprocessing or filtering was applied beyond tokenization by the base model
|
|
| 117 |
- Centering: mean vector of non-toxic embeddings removed before SVD to preserve syntactic knowledge.
|
| 118 |
|
| 119 |
|
| 120 |
-
### Speeds, Sizes, Times
|
| 121 |
|
| 122 |
- Time: 15.17 seconds
|
| 123 |
- Max GPU use: 9399.65 MB
|
|
@@ -132,8 +132,6 @@ No preprocessing or filtering was applied beyond tokenization by the base model
|
|
| 132 |
- Capability (for larger models): zero-shot accuracy across 7 EleutherAI LM Harness tasks: BoolQ, RTE, HellaSwag, WinoGrande, ARC-Easy, ARC-Challenge, and OpenBookQA.
|
| 133 |
|
| 134 |
### Results
|
| 135 |
-
|
| 136 |
-
|
| 137 |
| **Model** | **Method** | **Toxicity ↓** | **Perplexity ↓** | **Capability ↑** |
|
| 138 |
|:-----------|:------------|:---------------|:-----------------|:-----------------|
|
| 139 |
| **GPT-2 Medium** | Original | 48.00 (0.00) | 29.70 (0.00) | – |
|
|
@@ -151,16 +149,6 @@ No preprocessing or filtering was applied beyond tokenization by the base model
|
|
| 151 |
| **GPT-J 6B** | Original | 45.31 (0.00) | 13.24 (0.00) | 51.92 |
|
| 152 |
| | DPO | 43.67 (1.11) | 13.96 (0.53) | 52.46 |
|
| 153 |
| | **ProFS** | **37.36 (2.28)** | 14.53 (0.30) | 52.48 |
|
| 154 |
-
|
| 155 |
-
*Mean ± stdev over three runs; lower toxicity/perplexity are better.*
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
## Citation
|
| 165 |
|
| 166 |
**BibTeX:**
|
|
|
|
| 92 |
|
| 93 |
## Training (Editing) Details
|
| 94 |
|
| 95 |
+
### Data
|
| 96 |
We use the pairwise toxicity preference dataset introduced by [Lee et al. (2024)](https://arxiv.org/abs/2401.01967).
|
| 97 |
|
| 98 |
- Non-toxic sequences: sampled from WikiText-2.
|
|
|
|
| 117 |
- Centering: mean vector of non-toxic embeddings removed before SVD to preserve syntactic knowledge.
|
| 118 |
|
| 119 |
|
| 120 |
+
### Speeds, Sizes, Times
|
| 121 |
|
| 122 |
- Time: 15.17 seconds
|
| 123 |
- Max GPU use: 9399.65 MB
|
|
|
|
| 132 |
- Capability (for larger models): zero-shot accuracy across 7 EleutherAI LM Harness tasks: BoolQ, RTE, HellaSwag, WinoGrande, ARC-Easy, ARC-Challenge, and OpenBookQA.
|
| 133 |
|
| 134 |
### Results
|
|
|
|
|
|
|
| 135 |
| **Model** | **Method** | **Toxicity ↓** | **Perplexity ↓** | **Capability ↑** |
|
| 136 |
|:-----------|:------------|:---------------|:-----------------|:-----------------|
|
| 137 |
| **GPT-2 Medium** | Original | 48.00 (0.00) | 29.70 (0.00) | – |
|
|
|
|
| 149 |
| **GPT-J 6B** | Original | 45.31 (0.00) | 13.24 (0.00) | 51.92 |
|
| 150 |
| | DPO | 43.67 (1.11) | 13.96 (0.53) | 52.46 |
|
| 151 |
| | **ProFS** | **37.36 (2.28)** | 14.53 (0.30) | 52.48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
## Citation
|
| 153 |
|
| 154 |
**BibTeX:**
|