Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,24 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
# Model Card for Model ID
|
|
@@ -174,26 +192,30 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
| 174 |
|
| 175 |
**BibTeX:**
|
| 176 |
|
| 177 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
|
| 179 |
**APA:**
|
| 180 |
|
| 181 |
-
|
| 182 |
|
| 183 |
-
## Glossary [optional]
|
| 184 |
|
| 185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 186 |
|
| 187 |
-
[More Information Needed]
|
| 188 |
-
|
| 189 |
-
## More Information [optional]
|
| 190 |
|
| 191 |
[More Information Needed]
|
| 192 |
-
|
| 193 |
-
## Model Card Authors [optional]
|
| 194 |
|
| 195 |
[More Information Needed]
|
| 196 |
|
| 197 |
## Model Card Contact
|
| 198 |
|
| 199 |
-
[More Information Needed]
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
tags:
|
| 4 |
+
- generation
|
| 5 |
+
- safety
|
| 6 |
+
- model-editing
|
| 7 |
+
- editing
|
| 8 |
+
- activation-steering
|
| 9 |
+
- activation-editing
|
| 10 |
+
- dpo
|
| 11 |
+
- rlhf
|
| 12 |
+
- profs
|
| 13 |
+
- detox
|
| 14 |
+
- toxicity
|
| 15 |
+
- iclr
|
| 16 |
+
- iclr2025
|
| 17 |
+
license: mit
|
| 18 |
+
language:
|
| 19 |
+
- en
|
| 20 |
+
base_model:
|
| 21 |
+
- openai-community/gpt2-medium
|
| 22 |
---
|
| 23 |
|
| 24 |
# Model Card for Model ID
|
|
|
|
| 192 |
|
| 193 |
**BibTeX:**
|
| 194 |
|
| 195 |
+
@inproceedings{uppaalmodel,
|
| 196 |
+
title={Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity},
|
| 197 |
+
author={Uppaal, Rheeya and Dey, Apratim and He, Yiting and Zhong, Yiqiao and Hu, Junjie},
|
| 198 |
+
booktitle={The Thirteenth International Conference on Learning Representations}
|
| 199 |
+
}
|
| 200 |
|
| 201 |
**APA:**
|
| 202 |
|
| 203 |
+
Uppaal, R., Dey, A., He, Y., Zhong, Y., & Hu, J. Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity. In The Thirteenth International Conference on Learning Representations.
|
| 204 |
|
| 205 |
+
<!-- ## Glossary [optional]
|
| 206 |
|
| 207 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> -->
|
| 208 |
|
| 209 |
+
<!-- [More Information Needed]
|
| 210 |
+
-->
|
| 211 |
+
<!-- ## More Information [optional]
|
| 212 |
|
| 213 |
[More Information Needed]
|
| 214 |
+
-->
|
| 215 |
+
<!-- ## Model Card Authors [optional]
|
| 216 |
|
| 217 |
[More Information Needed]
|
| 218 |
|
| 219 |
## Model Card Contact
|
| 220 |
|
| 221 |
+
[More Information Needed] -->
|