Add pipeline tag, license, link to code, and chain-of-thought tag
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,16 +1,22 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
-
|
|
|
|
|
|
|
| 4 |
---
|
|
|
|
| 5 |
**Repository for:**
|
| 6 |
|
| 7 |
**ThinkEdit-deepseek-llama3-8b**
|
| 8 |
|
| 9 |
(We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-qwen-14b.)
|
| 10 |
|
| 11 |
-
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng
|
| 12 |
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
|
| 13 |
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
## Introduction
|
|
@@ -19,8 +25,8 @@ Reasoning-augmented models sometimes fail by generating **overly short**, abstra
|
|
| 19 |
|
| 20 |
**ThinkEdit** is a lightweight weight-editing method that:
|
| 21 |
|
| 22 |
-
- Identifies
|
| 23 |
-
- Edits only
|
| 24 |
- Removes the "short reasoning" direction from their output
|
| 25 |
- Boosts performance, especially on cases with short reasoning traces
|
| 26 |
|
|
@@ -75,12 +81,12 @@ The usage of ThinkEdit models is exactly the same as the original deepseek-disti
|
|
| 75 |
|
| 76 |
```bibtex
|
| 77 |
@misc{sun2025thinkedit,
|
| 78 |
-
title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
|
| 79 |
author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
|
| 80 |
year={2025},
|
| 81 |
eprint={2503.22048},
|
| 82 |
archivePrefix={arXiv},
|
| 83 |
primaryClass={cs.CL},
|
| 84 |
-
url={https://arxiv.org/abs/2503.22048},
|
| 85 |
}
|
| 86 |
```
|
|
|
|
| 1 |
---
|
| 2 |
+
license: mit
|
| 3 |
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- chain-of-thought
|
| 7 |
---
|
| 8 |
+
|
| 9 |
**Repository for:**
|
| 10 |
|
| 11 |
**ThinkEdit-deepseek-llama3-8b**
|
| 12 |
|
| 13 |
(We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-qwen-14b.)
|
| 14 |
|
| 15 |
+
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng
|
| 16 |
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
|
| 17 |
|
| 18 |
+
Code: https://github.com/Trustworthy-ML-Lab/ThinkEdit
|
| 19 |
+
|
| 20 |
---
|
| 21 |
|
| 22 |
## Introduction
|
|
|
|
| 25 |
|
| 26 |
**ThinkEdit** is a lightweight weight-editing method that:
|
| 27 |
|
| 28 |
+
- Identifies ~2% of "short reasoning" attention heads
|
| 29 |
+
- Edits only ~0.1% of total parameters
|
| 30 |
- Removes the "short reasoning" direction from their output
|
| 31 |
- Boosts performance, especially on cases with short reasoning traces
|
| 32 |
|
|
|
|
| 81 |
|
| 82 |
```bibtex
|
| 83 |
@misc{sun2025thinkedit,
|
| 84 |
+
title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
|
| 85 |
author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
|
| 86 |
year={2025},
|
| 87 |
eprint={2503.22048},
|
| 88 |
archivePrefix={arXiv},
|
| 89 |
primaryClass={cs.CL},
|
| 90 |
+
url={https://arxiv.org/abs/2503.22048},
|
| 91 |
}
|
| 92 |
```
|