Improve model card: add library_name, paper link, and clean up structure

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +24 -95
README.md CHANGED
@@ -1,105 +1,34 @@
1
  ---
 
2
  language:
3
  - zh
4
  - en
 
5
  pipeline_tag: text-generation
 
6
  tags:
7
  - deepscaler
8
  - reasoning
9
  - grpo
10
  - qwen2
11
- base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
12
- license: other
13
  ---
14
 
15
  # DECS_7B
16
 
17
- This is the official model for ICLR 2026 Oral "Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling".
18
- DECS_7B is a reasoning-focused causal language model built from `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` and further trained with DECS algorithm, focused on 50% fewer tokens when answering a reasoning-required problem.
19
-
20
- ## Model Summary
21
-
22
- - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`
23
- - Upload date: `2026-02-24`
24
- - Recommended use: long-form reasoning and mathematical/problem-solving style generation
25
-
26
- ## Quick Start (Transformers)
27
-
28
- ```python
29
- import torch
30
- from transformers import AutoModelForCausalLM, AutoTokenizer
31
-
32
- model_id = "pixas/DECS_7B"
33
- tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
34
- model = AutoModelForCausalLM.from_pretrained(
35
- model_id,
36
- torch_dtype=torch.bfloat16,
37
- device_map="auto",
38
- )
39
-
40
- messages = [
41
- {"role": "user", "content": "Solve: If x^2 - 5x + 6 = 0, what are x values?"}
42
- ]
43
- prompt = tokenizer.apply_chat_template(
44
- messages, tokenize=False, add_generation_prompt=True
45
- )
46
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
47
-
48
- with torch.no_grad():
49
- outputs = model.generate(
50
- **inputs,
51
- max_new_tokens=512,
52
- temperature=0.6,
53
- top_p=0.95,
54
- )
55
-
56
- new_tokens = outputs[0][inputs["input_ids"].shape[-1]:]
57
- print(tokenizer.decode(new_tokens, skip_special_tokens=True))
58
- ```
59
 
60
- ## Quick Start (vLLM)
61
 
62
- ```python
63
- from vllm import LLM, SamplingParams
64
 
65
- llm = LLM(model="pixas/DECS_7B", trust_remote_code=True)
66
- sampling = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512)
67
- prompt = "Please reason step by step: what is 37 * 48?"
68
- outputs = llm.generate([prompt], sampling_params=sampling)
69
- print(outputs[0].outputs[0].text)
70
- ```
71
-
72
- ## Notes
73
-
74
- - This model may produce incorrect or unverifiable reasoning. Always validate outputs in high-stakes settings.
75
- - Performance can vary by prompt style and decoding parameters.
76
- - License and acceptable-use constraints should follow the upstream base model and your deployment policy.
77
-
78
-
79
- ## Citation
80
- ---
81
- language:
82
- - zh
83
- - en
84
- pipeline_tag: text-generation
85
- tags:
86
- - deepscaler
87
- - reasoning
88
- - grpo
89
- - qwen2
90
- base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
91
- license: other
92
- ---
93
-
94
- # DECS_1.5B
95
- This is the official model for ICLR 2026 Oral "Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling".
96
- DECS_1.5B is a reasoning-focused causal language model built from `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` and further trained with DECS algorithm, focused on 50% fewer tokens when answering a reasoning-required problem.
97
 
98
  ## Model Summary
99
 
100
- - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`
101
- - Upload date: `2026-02-24`
102
- - Recommended use: long-form reasoning and mathematical/problem-solving style generation
103
 
104
  ## Quick Start (Transformers)
105
 
@@ -107,7 +36,7 @@ DECS_1.5B is a reasoning-focused causal language model built from `deepseek-ai/D
107
  import torch
108
  from transformers import AutoModelForCausalLM, AutoTokenizer
109
 
110
- model_id = "pixas/DECS_1.5B"
111
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
112
  model = AutoModelForCausalLM.from_pretrained(
113
  model_id,
@@ -140,7 +69,7 @@ print(tokenizer.decode(new_tokens, skip_special_tokens=True))
140
  ```python
141
  from vllm import LLM, SamplingParams
142
 
143
- llm = LLM(model="pixas/DECS_1.5B", trust_remote_code=True)
144
  sampling = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512)
145
  prompt = "Please reason step by step: what is 37 * 48?"
146
  outputs = llm.generate([prompt], sampling_params=sampling)
@@ -149,20 +78,20 @@ print(outputs[0].outputs[0].text)
149
 
150
  ## Notes
151
 
152
- - This model may produce incorrect or unverifiable reasoning. Always validate outputs in high-stakes settings.
153
- - Performance can vary by prompt style and decoding parameters.
154
- - License and acceptable-use constraints should follow the upstream base model and your deployment policy.
155
-
156
 
157
  ## Citation
158
 
159
  If you use this model, please cite our paper:
 
160
  ```bibtex
161
- @inproceedings{jiang2026overthinking,
162
- title={Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling},
163
- author={Shuyang Jiang and Yusheng Liao and Ya Zhang and Yanfeng Wang and Yu Wang},
164
- booktitle={The Fourteenth International Conference on Learning Representations},
165
- year={2026},
166
- url={https://openreview.net/forum?id=kdeiRledV6}
 
167
  }
168
- ```
 
1
  ---
2
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
3
  language:
4
  - zh
5
  - en
6
+ license: other
7
  pipeline_tag: text-generation
8
+ library_name: transformers
9
  tags:
10
  - deepscaler
11
  - reasoning
12
  - grpo
13
  - qwen2
 
 
14
  ---
15
 
16
  # DECS_7B
17
 
18
+ This is the official model repository for **DECS_7B**, presented in the ICLR 2026 Oral paper: **"Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling"**.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ [**Paper**](https://huggingface.co/papers/2509.25827) | [**Code**](https://github.com/pixas/DECS) | [**Project Page**](https://pixas.github.io/decs-iclr26-site/)
21
 
22
+ ## Model Description
23
+ DECS_7B is a reasoning-focused causal language model built from `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` and further trained with the **DECS** (Decoupled Rewards and Curriculum Scheduling) algorithm.
24
 
25
+ The DECS framework addresses the "overthinking" problem in large reasoning models—where models generate excessively long reasoning paths without performance benefits. DECS achieves a reduction in reasoning tokens by over 50% across multiple benchmarks while maintaining or improving accuracy. It introduces a decoupled token-level reward mechanism and a curriculum batch scheduling strategy to optimize the efficiency-efficacy equilibrium.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Model Summary
28
 
29
+ - **Base model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`
30
+ - **Upload date:** `2026-02-24`
31
+ - **Recommended use:** Long-form reasoning, mathematical problem solving, and efficient step-by-step logic generation.
32
 
33
  ## Quick Start (Transformers)
34
 
 
36
  import torch
37
  from transformers import AutoModelForCausalLM, AutoTokenizer
38
 
39
+ model_id = "pixas/DECS_7B"
40
  tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
41
  model = AutoModelForCausalLM.from_pretrained(
42
  model_id,
 
69
  ```python
70
  from vllm import LLM, SamplingParams
71
 
72
+ llm = LLM(model="pixas/DECS_7B", trust_remote_code=True)
73
  sampling = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512)
74
  prompt = "Please reason step by step: what is 37 * 48?"
75
  outputs = llm.generate([prompt], sampling_params=sampling)
 
78
 
79
  ## Notes
80
 
81
+ - **Reasoning Accuracy:** While optimized for efficiency, this model may produce incorrect or unverifiable reasoning. Always validate outputs in high-stakes settings.
82
+ - **Licensing:** License and acceptable-use constraints follow the upstream base model and your deployment policy.
 
 
83
 
84
  ## Citation
85
 
86
  If you use this model, please cite our paper:
87
+
88
  ```bibtex
89
+ @inproceedings{jiang2026decs,
90
+ title = {Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling},
91
+ author = {Jiang, Shuyang and Tao, Xiaofeng and Zhang, Kui and Xiao, Yanghua},
92
+ booktitle = {International Conference on Learning Representations (ICLR)},
93
+ year = {2026},
94
+ note = {Oral},
95
+ url = {https://arxiv.org/abs/2509.25827}
96
  }
97
+ ```