Buckets:
| license: apache-2.0 | |
| datasets: | |
| - agentica-org/DeepScaleR-Preview-Dataset | |
| base_model: | |
| - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B | |
| tags: | |
| - reinforcement-learning | |
| language: | |
| - en | |
| - zh | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| <p align="center"> | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/64ed568ccf6118a9379a61b8/BHITqJU33sXqf-Jbytrxg.png" width="100"/> | |
| <b><span style="font-size:28px">SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression</span></b> | |
| </p> | |
| <p align="center"> | |
| 📃 <a href="https://arxiv.org/abs/2509.25176" target="_blank">Paper</a> • 📝 <a href="https://api.wandb.ai/links/teamsiri/isge4elx" target="_blank">Wandb</a> | |
| </p> | |
| --- | |
| ## 🔍 Overview | |
| **SIRI (Scaling Iterative Reinforcement Learning with Interleaved Compression)** is a reinforcement-learning–based framework designed to improve the efficiency and accuracy of **Large Reasoning Models (LRMs)**. | |
| Traditional RL training often causes **overthinking** and long, redundant reasoning traces. Prior methods that compress outputs (length penalties, pruning, or skipping thought tokens) improve efficiency but hurt accuracy. | |
| SIRI solves this trade-off by **iteratively alternating between compression and expansion of the reasoning budget**, controlled by a cosine length scheduler. This approach dynamically balances concise reasoning with long-horizon exploration. | |
| <p align="center"> | |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/64ed568ccf6118a9379a61b8/SXow6xntEgrwhvWtzvrkE.png" alt="pareto_front" width="500"/> | |
| </p> | |
| --- | |
| ## 🚀 Key Features | |
| - **Interleaved Compression–Expansion**: | |
| - *Compression phase*: forces concise, high-density reasoning by limiting rollout length. | |
| - *Expansion phase*: restores longer rollouts to encourage exploration and planning. | |
| - **Token Efficiency without Accuracy Loss**: Unlike previous methods, SIRI improves accuracy *while reducing average token usage*. | |
| - **Iterative RL Training**: Built on GRPO with modifications from DAPO (clip-high/low decoupling, KL removal). | |
| - **Generalization Across Model Sizes**: Validated on both **1.5B** and **7B** models. | |
| --- | |
| ## 📊 Benchmarks | |
|  | |
| --- | |
| ## 📝 Citation | |
| ```bibtex | |
| @misc{wen2025siriscalingiterativereinforcement, | |
| title={SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression}, | |
| author={Haoming Wen and Yushi Bai and Juanzi Li and Jie Tang}, | |
| year={2025}, | |
| eprint={2509.25176}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG}, | |
| url={https://arxiv.org/abs/2509.25176}, | |
| } | |
| ``` |
Xet Storage Details
- Size:
- 2.73 kB
- Xet hash:
- 236aa3f9e1e6cc7cbb7740424b656dc2339b258f3c5fd0023114d0212f6709e2
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.