Create README.md

f152952 verified 11 months ago

3.89 kB

	---
	license: mit
	datasets:
	- agentica-org/DeepScaleR-Preview-Dataset
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	tags:
	- LRM
	- hybrid_reasoning
	- efficient_reasoning
	---

	# AdaptThink: LLM Can Learn When to Think

	<p align="center">
	🤗 <a href="https://huggingface.co/collections/THU-KEG/adaptthink-682a1059aa9f5102c4fa0470" target="_blank">HF Collections</a> • 💻 <a href="" target="_blank">Github Repo</a> • 📃 <a href="https://arxiv.org/abs/2505.13417" target="_blank">Paper</a>
	</p>

	## 🔍 Table of Contents
	- [🤖️ AdaptThink](#adapt_think)
	- [⚙️ Released Models](#model)
	- [📊 Evaluation](#evaluation)
	- [📝 Citation](#citation)

	<a name="adapt_think"></a>
	## 🤖️ AdaptThink
	We present AdapThink, a novel reinforcement learning (RL) algorithm that enables reasoning models to adaptively choose between Thinking and NoThinking modes according to the difficulty of each input problem, thereby achieving automatic hybrid reasoning. Specifically, the model engages in thinking only when the problem is determined to be challenging; for other simple question, it will bypass the thinking process and directly produce a concise final solution. This approach substantially reduces inference costs while further improving overall performance.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/JaeJiBwLkcwAuexRAkLX5.png)



	<a name="model"></a>
	## ⚙️ Released Models

	### All Available Datasets and Models
	We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminish the resultant improvement in accuracy.

	All the trained models are available on HuggingFace.


	\| Name \| HF Repo \|
	\|---\|---\|
	\| AdaptThink-1.5B-delta0 \| [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0) \|
	\| AdaptThink-1.5B-delta0.01 \| [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.01) \|
	\| AdaptThink-1.5B-delta0.02 \| [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.02) \|
	\| AdaptThink-1.5B-delta0.05 \| [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.05) \|
	\| AdaptThink-1.5B-delta0.075 \| [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.075) \|
	\| AdaptThink-1.5B-delta0.1 \| [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-1.5B-delta0.1) \|
	\| AdaptThink-7B-delta0.05 \| [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-7B-delta0.05) \|

	<a name="training"></a>

	## 📊 Evaluation Results

	We list our evaluation results as follows:
	##### 1. Comparison with existing methods for efficient reasoning on mathematics datasets

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/ZLV8ZfEet1dp-4jyzBxiG.png)

	##### 2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/GUNfW9qO2aaT9_lo1XXPf.png)

	##### 3. Comparison of different $\delta$ values

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/RXrXwxVSAYlR3-_t0GUwV.png)

	##### 4. Evaluation results on MMLU

	<img width="1000" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/19K2u6PNmYz3gx3JnHgn4.png">

	<a name="citation"></a>
	## 📝 Citation

	If you find our work useful, please consider citing LongReward:

	```
	@article{zhang2025adapt_think,
	title = {AdaptThink: LLM Can Learn When to Think}
	author={Jiajie Zhang and Nianyi Lin and Lei Hou and Ling Feng and Juanzi Li},
	journal={arXiv preprint arXiv: 2505.13417},
	url={https://arxiv.org/abs/2505.13417}
	year={2025}
	}
	```