Alibaba-Apsara
/

DASD-4B-Thinking

@@ -13,24 +13,19 @@ datasets:
 [![GitHub](https://img.shields.io/badge/GitHub-DASD--Thinking-181717?logo=github&logoColor=white)](https://github.com/D2I-ai/dasd-thinking)&#160;
-[![GitHub](https://img.shields.io/badge/GitHub-Technical--Report-181717?logo=github&logoColor=white)](https://github.com/D2I-ai/dasd-thinking/blob/main/dasd_technical_report.pdf)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-DASD--4B--Thinking-yellow)](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)&#160;
-[![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-DASD--4B--Thinking-624aff)](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-4B-Thinking)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-DASD--30B--A3B--Thinking--Preview-yellow)](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)&#160;
-[![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-DASD--30B--A3B--Thinking--Preview-624aff)](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b-red)](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)&#160;
-[![ModelScope Model](https://img.shields.io/badge/🤖%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b-124aff)](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b--Logprob-red)](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)&#160;
-[![ModelScope Model](https://img.shields.io/badge/🤖%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b--Logprob-124aff)](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)&#160;
@@ -91,7 +86,7 @@ DASD-4B-Thinking democratizes the training recipe:
 ## ⚙️ Post-Training Pipeline
-DASD-Thinking introduces a new paradigm of **Distribution-Aligned Sequence Distillation**. This represents an enhanced sequence-level distillation pipeline that incorporates **Temperature-scheduled Learning**, **Divergence-aware Sampling**, and **Mixed-policy Distillation** , achieving efficient capability transfer with a minimal amount of data (**448K**). Please refer to our [report](https://github.com/D2I-ai/dasd-thinking/blob/main/dasd_technical_report.pdf) for more details.
 <div style="text-align: center;">
   <img src="assets/pipeline.jpg" alt="DASD-Thinking training pipeline" style="width: 90%;">
@@ -177,11 +172,12 @@ While DASD-4B-Thinking demonstrates remarkable performance across mathematical,
 DASD-Thinking is developed by Alibaba Cloud, as part of our mission to advance open, efficient, and trustworthy reasoning systems. If you find this work useful in your research or applications, please cite our technical report.
 ```bibtex
-@misc{yan2026dasd,
   title={Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning},
   author={Yan, Shaotian and Liu, Kaiyuan and Shen, Chen and Wang, Bing and Fan, Sinan and Zhang, Jun and Wu, Yue and Wang, Zheng and Ye, Jieping},
   year={2026},
-  url={https://github.com/D2I-ai/dasd-thinking/blob/main/dasd_technical_report.pdf}
 }
 @article{liu2025where,

 [![GitHub](https://img.shields.io/badge/GitHub-DASD--Thinking-181717?logo=github&logoColor=white)](https://github.com/D2I-ai/dasd-thinking)&#160;
+<a href="https://arxiv.org/abs/2601.09088" target="_blank"><img src="https://img.shields.io/badge/Technical Report-b5212f.svg?logo=arxiv" height="21px"></a>
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-DASD--4B--Thinking-yellow)](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-DASD--30B--A3B--Thinking--Preview-yellow)](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b-red)](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b--Logprob-red)](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)&#160;
 ## ⚙️ Post-Training Pipeline
+DASD-Thinking introduces a new paradigm of **Distribution-Aligned Sequence Distillation**. This represents an enhanced sequence-level distillation pipeline that incorporates **Temperature-scheduled Learning**, **Divergence-aware Sampling**, and **Mixed-policy Distillation** , achieving efficient capability transfer with a minimal amount of data (**448K**). Please refer to our [report](https://arxiv.org/abs/2601.09088) for more details.
 <div style="text-align: center;">
   <img src="assets/pipeline.jpg" alt="DASD-Thinking training pipeline" style="width: 90%;">
 DASD-Thinking is developed by Alibaba Cloud, as part of our mission to advance open, efficient, and trustworthy reasoning systems. If you find this work useful in your research or applications, please cite our technical report.
 ```bibtex
+@article{yan2026dasd,
   title={Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning},
   author={Yan, Shaotian and Liu, Kaiyuan and Shen, Chen and Wang, Bing and Fan, Sinan and Zhang, Jun and Wu, Yue and Wang, Zheng and Ye, Jieping},
   year={2026},
+  journal={arXiv preprint arXiv:2601.09088},
+  url={https://arxiv.org/abs/2601.09088}
 }
 @article{liu2025where,