Safetensors
qwen2
LRM
hybrid_reasoning
efficient_reasoning

Improve model card with library and pipeline information

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -1,9 +1,11 @@
1
  ---
2
- license: mit
3
- datasets:
4
- - agentica-org/DeepScaleR-Preview-Dataset
5
  base_model:
6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 
 
 
 
 
7
  tags:
8
  - LRM
9
  - hybrid_reasoning
@@ -13,7 +15,7 @@ tags:
13
  # AdaptThink: LLM Can Learn When to Think
14
 
15
  <p align="center">
16
- 🤗 <a href="https://huggingface.co/collections/THU-KEG/adaptthink-682a1059aa9f5102c4fa0470" target="_blank">HF Collections</a> • 💻 <a href="" target="_blank">Github Repo</a> • 📃 <a href="https://arxiv.org/abs/2505.13417" target="_blank">Paper</a>
17
  </p>
18
 
19
  ## 🔍 Table of Contents
@@ -34,7 +36,7 @@ We present **AdapThink**, a novel reinforcement learning (RL) algorithm that ena
34
  ## ⚙️ Released Models
35
 
36
  ### All Available Datasets and Models
37
- We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminish the resultant improvement in accuracy.
38
 
39
  All the trained models are available on HuggingFace.
40
 
@@ -83,5 +85,4 @@ If you find our work useful, please consider citing LongReward:
83
  url={https://arxiv.org/abs/2505.13417}
84
  year={2025}
85
  }
86
- ```
87
-
 
1
  ---
 
 
 
2
  base_model:
3
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
4
+ datasets:
5
+ - agentica-org/DeepScaleR-Preview-Dataset
6
+ license: mit
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
  tags:
10
  - LRM
11
  - hybrid_reasoning
 
15
  # AdaptThink: LLM Can Learn When to Think
16
 
17
  <p align="center">
18
+ 🤗 <a href="https://huggingface.co/collections/THU-KEG/adaptthink-682a1059aa9f5102c4fa0470" target="_blank">HF Collections</a> • 💻 <a href="https://github.com/THU-KEG/AdaptThink" target="_blank">Github Repo</a> • 📃 <a href="https://arxiv.org/abs/2505.13417" target="_blank">Paper</a>
19
  </p>
20
 
21
  ## 🔍 Table of Contents
 
36
  ## ⚙️ Released Models
37
 
38
  ### All Available Datasets and Models
39
+ We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminishes the resultant improvement in accuracy.
40
 
41
  All the trained models are available on HuggingFace.
42
 
 
85
  url={https://arxiv.org/abs/2505.13417}
86
  year={2025}
87
  }
88
+ ```