Add pipeline tag, license and link to the code
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,7 +1,10 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
tags: []
|
|
|
|
|
|
|
| 4 |
---
|
|
|
|
| 5 |
**Repository for:**
|
| 6 |
|
| 7 |
**ThinkEdit-deepseek-qwen-14b**
|
|
@@ -9,7 +12,8 @@ tags: []
|
|
| 9 |
(We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-llama3-8b.)
|
| 10 |
|
| 11 |
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng\
|
| 12 |
-
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
|
|
|
|
| 13 |
|
| 14 |
---
|
| 15 |
|
|
@@ -19,8 +23,8 @@ Reasoning-augmented models sometimes fail by generating **overly short**, abstra
|
|
| 19 |
|
| 20 |
**ThinkEdit** is a lightweight weight-editing method that:
|
| 21 |
|
| 22 |
-
- Identifies
|
| 23 |
-
- Edits only
|
| 24 |
- Removes the "short reasoning" direction from their output
|
| 25 |
- Boosts performance, especially on cases with short reasoning traces
|
| 26 |
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
tags: []
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
license: other
|
| 6 |
---
|
| 7 |
+
|
| 8 |
**Repository for:**
|
| 9 |
|
| 10 |
**ThinkEdit-deepseek-qwen-14b**
|
|
|
|
| 12 |
(We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-llama3-8b.)
|
| 13 |
|
| 14 |
**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng\
|
| 15 |
+
**Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)\
|
| 16 |
+
**Code:** https://github.com/Trustworthy-ML-Lab/ThinkEdit
|
| 17 |
|
| 18 |
---
|
| 19 |
|
|
|
|
| 23 |
|
| 24 |
**ThinkEdit** is a lightweight weight-editing method that:
|
| 25 |
|
| 26 |
+
- Identifies ~2% of "short reasoning" attention heads
|
| 27 |
+
- Edits only ~0.1% of total parameters
|
| 28 |
- Removes the "short reasoning" direction from their output
|
| 29 |
- Boosts performance, especially on cases with short reasoning traces
|
| 30 |
|