| --- |
| language: |
| - zh |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - deepscaler |
| - grpo |
| - qwen2 |
| base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
| license: other |
| library_name: transformers |
| --- |
| |
| # DECS_7B |
| |
| This is the official model for ICLR 2026 Oral "Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling". |
| DECS_7B is a reasoning-focused causal language model built from `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` and further trained with DECS algorithm, focused on 50% fewer tokens when answering a reasoning-required problem. |
|
|
| ## Model Summary |
|
|
| - Base model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
| - Upload date: `2026-02-24` |
| - Recommended use: long-form reasoning and mathematical/problem-solving style generation |
| - Paper link: https://arxiv.org/pdf/2509.25827 |
| - Project page: https://pixas.github.io/decs-iclr26-site/ |
| - Github repo: https://github.com/pixas/DECS |
|
|
| ## Quick Start (Transformers) |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = "pixas/DECS_7B" |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| |
| messages = [ |
| {"role": "user", "content": "Solve: If x^2 - 5x + 6 = 0, what are x values?"} |
| ] |
| prompt = tokenizer.apply_chat_template( |
| messages, tokenize=False, add_generation_prompt=True |
| ) |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=512, |
| temperature=0.6, |
| top_p=0.95, |
| ) |
| |
| new_tokens = outputs[0][inputs["input_ids"].shape[-1]:] |
| print(tokenizer.decode(new_tokens, skip_special_tokens=True)) |
| ``` |
|
|
| ## Quick Start (vLLM) |
|
|
| ```python |
| from vllm import LLM, SamplingParams |
| |
| llm = LLM(model="pixas/DECS_7B", trust_remote_code=True) |
| sampling = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=512) |
| prompt = "Please reason step by step: what is 37 * 48?" |
| outputs = llm.generate([prompt], sampling_params=sampling) |
| print(outputs[0].outputs[0].text) |
| ``` |
|
|
| ## Notes |
|
|
| - This model may produce incorrect or unverifiable reasoning. Always validate outputs in high-stakes settings. |
| - Performance can vary by prompt style and decoding parameters. |
| - License and acceptable-use constraints should follow the upstream base model and your deployment policy. |
|
|
|
|
|
|
| ## Citation |
|
|
| If you use this model, please cite our paper: |
| ```bibtex |
| @inproceedings{jiang2026overthinking, |
| title={Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling}, |
| author={Shuyang Jiang and Yusheng Liao and Ya Zhang and Yanfeng Wang and Yu Wang}, |
| booktitle={The Fourteenth International Conference on Learning Representations}, |
| year={2026}, |
| url={https://openreview.net/forum?id=kdeiRledV6} |
| } |
| ``` |