Uppaal commited on
Commit
b3e531d
·
verified ·
1 Parent(s): 1ae6dde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -14
README.md CHANGED
@@ -92,7 +92,7 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
92
 
93
  ## Training (Editing) Details
94
 
95
- ### Training Data
96
  We use the pairwise toxicity preference dataset introduced by [Lee et al. (2024)](https://arxiv.org/abs/2401.01967).
97
 
98
  - Non-toxic sequences: sampled from WikiText-2.
@@ -117,7 +117,7 @@ No preprocessing or filtering was applied beyond tokenization by the base model
117
  - Centering: mean vector of non-toxic embeddings removed before SVD to preserve syntactic knowledge.
118
 
119
 
120
- ### Speeds, Sizes, Times [optional]
121
 
122
  - Time: 15.17 seconds
123
  - Max GPU use: 9399.65 MB
@@ -132,8 +132,6 @@ No preprocessing or filtering was applied beyond tokenization by the base model
132
  - Capability (for larger models): zero-shot accuracy across 7 EleutherAI LM Harness tasks: BoolQ, RTE, HellaSwag, WinoGrande, ARC-Easy, ARC-Challenge, and OpenBookQA.
133
 
134
  ### Results
135
-
136
-
137
  | **Model** | **Method** | **Toxicity ↓** | **Perplexity ↓** | **Capability ↑** |
138
  |:-----------|:------------|:---------------|:-----------------|:-----------------|
139
  | **GPT-2 Medium** | Original | 48.00 (0.00) | 29.70 (0.00) | – |
@@ -151,16 +149,6 @@ No preprocessing or filtering was applied beyond tokenization by the base model
151
  | **GPT-J 6B** | Original | 45.31 (0.00) | 13.24 (0.00) | 51.92 |
152
  | | DPO | 43.67 (1.11) | 13.96 (0.53) | 52.46 |
153
  | | **ProFS** | **37.36 (2.28)** | 14.53 (0.30) | 52.48 |
154
-
155
- *Mean ± stdev over three runs; lower toxicity/perplexity are better.*
156
-
157
-
158
-
159
-
160
-
161
-
162
-
163
-
164
  ## Citation
165
 
166
  **BibTeX:**
 
92
 
93
  ## Training (Editing) Details
94
 
95
+ ### Data
96
  We use the pairwise toxicity preference dataset introduced by [Lee et al. (2024)](https://arxiv.org/abs/2401.01967).
97
 
98
  - Non-toxic sequences: sampled from WikiText-2.
 
117
  - Centering: mean vector of non-toxic embeddings removed before SVD to preserve syntactic knowledge.
118
 
119
 
120
+ ### Speeds, Sizes, Times
121
 
122
  - Time: 15.17 seconds
123
  - Max GPU use: 9399.65 MB
 
132
  - Capability (for larger models): zero-shot accuracy across 7 EleutherAI LM Harness tasks: BoolQ, RTE, HellaSwag, WinoGrande, ARC-Easy, ARC-Challenge, and OpenBookQA.
133
 
134
  ### Results
 
 
135
  | **Model** | **Method** | **Toxicity ↓** | **Perplexity ↓** | **Capability ↑** |
136
  |:-----------|:------------|:---------------|:-----------------|:-----------------|
137
  | **GPT-2 Medium** | Original | 48.00 (0.00) | 29.70 (0.00) | – |
 
149
  | **GPT-J 6B** | Original | 45.31 (0.00) | 13.24 (0.00) | 51.92 |
150
  | | DPO | 43.67 (1.11) | 13.96 (0.53) | 52.46 |
151
  | | **ProFS** | **37.36 (2.28)** | 14.53 (0.30) | 52.48 |
 
 
 
 
 
 
 
 
 
 
152
  ## Citation
153
 
154
  **BibTeX:**