cesun
/

ThinkEdit-deepseek-llama3-8b

@@ -1,16 +1,22 @@
 ---
 library_name: transformers
-tags: []
 ---
 **Repository for:**
 **ThinkEdit-deepseek-llama3-8b**
 (We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-qwen-14b.)
-**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng\
 **Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
 ---
 ## Introduction
@@ -19,8 +25,8 @@ Reasoning-augmented models sometimes fail by generating **overly short**, abstra
 **ThinkEdit** is a lightweight weight-editing method that:
-- Identifies \~2% of "short reasoning" attention heads
-- Edits only \~0.1% of total parameters
 - Removes the "short reasoning" direction from their output
 - Boosts performance, especially on cases with short reasoning traces
@@ -75,12 +81,12 @@ The usage of ThinkEdit models is exactly the same as the original deepseek-disti
 ```bibtex
 @misc{sun2025thinkedit,
-      title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
       author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
       year={2025},
       eprint={2503.22048},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2503.22048},
 }
 ```

 ---
+license: mit
 library_name: transformers
+pipeline_tag: text-generation
+tags:
+- chain-of-thought
 ---
 **Repository for:**
 **ThinkEdit-deepseek-llama3-8b**
 (We also release ThinkEdit versions for ThinkEdit-deepseek-qwen-1.5b and ThinkEdit-deepseek-qwen-14b.)
+**Authors**: Chung-En Sun, Ge Yan, Tsui-Wei Weng
 **Paper**: [ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models](https://arxiv.org/abs/2503.22048)
+Code: https://github.com/Trustworthy-ML-Lab/ThinkEdit
 ---
 ## Introduction
 **ThinkEdit** is a lightweight weight-editing method that:
+- Identifies ~2% of "short reasoning" attention heads
+- Edits only ~0.1% of total parameters
 - Removes the "short reasoning" direction from their output
 - Boosts performance, especially on cases with short reasoning traces
 ```bibtex
 @misc{sun2025thinkedit,
+      title={ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models},
       author={Chung-En Sun and Ge Yan and Tsui-Wei Weng},
       year={2025},
       eprint={2503.22048},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2503.22048},
 }
 ```