Safetensors
qwen2
LRM
hybrid_reasoning
efficient_reasoning

Add Transformers library name, text-generation pipeline tag and Github link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +82 -17
README.md CHANGED
@@ -1,24 +1,27 @@
1
  ---
2
- license: mit
3
- datasets:
4
- - agentica-org/DeepScaleR-Preview-Dataset
5
  base_model:
6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 
 
 
7
  tags:
8
  - LRM
9
  - hybrid_reasoning
10
  - efficient_reasoning
 
 
11
  ---
12
 
13
  # AdaptThink: LLM Can Learn When to Think
14
 
15
  <p align="center">
16
- 🤗 <a href="https://huggingface.co/collections/THU-KEG/adaptthink-682a1059aa9f5102c4fa0470" target="_blank">HF Collections</a> • 💻 <a href="" target="_blank">Github Repo</a> • 📃 <a href="https://arxiv.org/abs/2505.13417" target="_blank">Paper</a>
17
  </p>
18
 
19
  ## 🔍 Table of Contents
20
  - [🤖️ AdaptThink](#adapt_think)
21
  - [⚙️ Released Models](#model)
 
22
  - [📊 Evaluation](#evaluation)
23
  - [📝 Citation](#citation)
24
 
@@ -34,7 +37,7 @@ We present **AdapThink**, a novel reinforcement learning (RL) algorithm that ena
34
  ## ⚙️ Released Models
35
 
36
  ### All Available Datasets and Models
37
- We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminish the resultant improvement in accuracy.
38
 
39
  All the trained models are available on HuggingFace.
40
 
@@ -50,25 +53,88 @@ All the trained models are available on HuggingFace.
50
  | AdaptThink-7B-delta0.05 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-7B-delta0.05) |
51
 
52
  <a name="training"></a>
 
53
 
54
- ## 📊 Evaluation Results
55
 
56
- We list our evaluation results as follows:
57
- ##### 1. Comparison with existing methods for efficient reasoning on mathematics datasets
 
 
 
 
 
58
 
59
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/ZLV8ZfEet1dp-4jyzBxiG.png)
 
 
60
 
61
- ##### 2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500
 
 
 
 
 
 
 
62
 
63
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/GUNfW9qO2aaT9_lo1XXPf.png)
 
 
64
 
65
- ##### 3. Comparison of different $\delta$ values
 
 
 
66
 
67
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/RXrXwxVSAYlR3-_t0GUwV.png)
 
68
 
69
- ##### 4. Evaluation results on MMLU
 
 
 
 
 
 
 
 
 
 
70
 
71
- <img width="1000" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/19K2u6PNmYz3gx3JnHgn4.png">
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  <a name="citation"></a>
74
  ## 📝 Citation
@@ -83,5 +149,4 @@ If you find our work useful, please consider citing LongReward:
83
  url={https://arxiv.org/abs/2505.13417}
84
  year={2025}
85
  }
86
- ```
87
-
 
1
  ---
 
 
 
2
  base_model:
3
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
4
+ datasets:
5
+ - agentica-org/DeepScaleR-Preview-Dataset
6
+ license: mit
7
  tags:
8
  - LRM
9
  - hybrid_reasoning
10
  - efficient_reasoning
11
+ pipeline_tag: text-generation
12
+ library_name: transformers
13
  ---
14
 
15
  # AdaptThink: LLM Can Learn When to Think
16
 
17
  <p align="center">
18
+ 🤗 <a href="https://huggingface.co/collections/THU-KEG/adaptthink-682a1059aa9f5102c4fa0470" target="_blank">HF Collections</a> • 💻 <a href="https://github.com/THU-KEG/AdaptThink" target="_blank">Github Repo</a> • 📃 <a href="https://arxiv.org/abs/2505.13417" target="_blank">Paper</a>
19
  </p>
20
 
21
  ## 🔍 Table of Contents
22
  - [🤖️ AdaptThink](#adapt_think)
23
  - [⚙️ Released Models](#model)
24
+ - [🔥 Training](#training)
25
  - [📊 Evaluation](#evaluation)
26
  - [📝 Citation](#citation)
27
 
 
37
  ## ⚙️ Released Models
38
 
39
  ### All Available Datasets and Models
40
+ We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminishes the resultant improvement in accuracy.
41
 
42
  All the trained models are available on HuggingFace.
43
 
 
53
  | AdaptThink-7B-delta0.05 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-7B-delta0.05) |
54
 
55
  <a name="training"></a>
56
+ ## 🔥 Training
57
 
58
+ Our training code is based on [VeRL](https://github.com/volcengine/verl) framework.
59
 
60
+ ### 1. Creating Environment
61
+ We use [vLLM](https://github.com/vllm-project/vllm) 0.8.2, which supports [flash-attention](https://github.com/Dao-AILab/flash-attention).
62
+ ```
63
+ conda create -n adapt_think python=3.10
64
+ pip install -r requirements.txt
65
+ pip install flash-attn --no-build-isolation
66
+ ```
67
 
68
+ ### 2. Check the chat template in HF models
69
+ After you download DeepSeek models, you should check `chat_template` in `tokenizer_config.json` to ensure the template ends with `<|Assistant|><think>\
70
+ `, otherwise there will be bugs when running our code.
71
 
72
+ ### 3. Pre-sampling from reference models
73
+ First, we need to pre-sample multiple responses from the reference model for each training problem to evaluate its instance-level accuracy. The sampling process will take several hours. For convenience, we have released our post-processed results in `./data/train/ref_results`, which can be directly used for training.
74
+ ```
75
+ # Initialize VLLM server. Set tensor_parallel_size to 8 for 7B model
76
+ vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --served_model_name DeepSeek-R1-Distill-Qwen-1.5B --tensor_parallel_size 4
77
+
78
+ # Sampling 16 responses for each training problem.
79
+ python src/presampling_ref_responses.py --K 16 --dataset_path ./data/train/deepscaler.json --model_name DeepSeek-R1-Distill-Qwen-1.5B --max_tokens 16384
80
 
81
+ # Postprocess to get instance-level accuracy
82
+ python src/postprocess_ref_results.py --input_path ./data/train/ref_presampling/DeepSeek-R1-Distill-Qwen-1.5B_deepscaler_n0_K16_len16384.json --output_path ./data/train/ref_results/DeepSeek-R1-Distill-Qwen-1.5B_deepscaler_K16_len16384.json
83
+ ```
84
 
85
+ ### 4. Preprocess training and test Datasets
86
+ ```
87
+ bash scripts/preprocess_dataset.sh
88
+ ```
89
 
90
+ ### 5. Training
91
+ The training context size, batch size, and the learning rate are set to 16K, 128, and 2e-6, respectively. We train the models for 1 epoch, which is 314 steps in total. For the 1.5B model, we use one 8\*H800 node and cost about 32 hours. For the 7B model, we use four 8\*H800 nodes and cost about 28 hours. Finally, we select the checkpoints on 300 and 150 steps for the 1.5B and 7B models, respectively, where the models' accuracy and response lengths achieve a good balance.
92
 
93
+ To facilitate the training process, you can set a larger learning rate, such as 5e-5. However, it may make the training more unstable.
94
+ ```
95
+ # 1.5b, single-node
96
+ bash scripts/run_adapt_think_1.5b_deepscaler_16k_delta0.05_btz128_lr2e-6.sh
97
+
98
+ # 7b, single-node
99
+ bash scripts/run_adapt_think_7b_deepscaler_16k_delta0.05_btz128_lr2e-6.sh
100
+
101
+ # 7b, multi-node
102
+ bash submit_mpi.sh scripts/run_adapt_think_7b_deepscaler_16k_delta0.05_btz128_lr2e-6_multinode.sh
103
+ ```
104
 
105
+
106
+ <a name="evaluation"></a>
107
+ ## 📊 Evaluation
108
+
109
+ During training, VeRL will automatically evaluate on you selected test sets for every `trainer.test_freq` step.
110
+
111
+ We also provide additional scripts for evaluation.
112
+
113
+ ```
114
+ # convert checkpoint to HF model
115
+ bash scripts/convert_to_hf.sh
116
+
117
+ # eval
118
+ bash scripts/run_eval_verl_hf.sh
119
+ ```
120
+
121
+ You can also evaluate downloaded HF models by running:
122
+ ```
123
+ bash scripts/run_eval_hf.sh
124
+ ```
125
+
126
+ We list our evaluation results as follows:
127
+ #### 1. Comparison with existing methods for efficient reasoning on mathematics datasets
128
+ <img width="1447" alt="image" src="https://github.com/user-attachments/assets/53592ec3-17d9-4c4b-99ee-1868b5c82238" />
129
+
130
+ #### 2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500
131
+ <img width="1462" alt="image" src="https://github.com/user-attachments/assets/cc2de266-b67a-47ab-835d-9bce922b13fc" />
132
+
133
+ #### 3. Comparison of different $\delta$ values
134
+ <img width="1444" alt="image" src="https://github.com/user-attachments/assets/41c86f73-68f8-4d71-ac75-2033c43b964b" />
135
+
136
+ #### 4. Evaluation results on MMLU
137
+ <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/19K2u6PNmYz3gx3JnHgn4.png" />
138
 
139
  <a name="citation"></a>
140
  ## 📝 Citation
 
149
  url={https://arxiv.org/abs/2505.13417}
150
  year={2025}
151
  }
152
+ ```