| | --- |
| | base_model: |
| | - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| | datasets: |
| | - knoveleng/open-rs |
| | - knoveleng/open-s1 |
| | - knoveleng/open-deepscaler |
| | license: mit |
| | pipeline_tag: text-generation |
| | inference: true |
| | library_name: transformers |
| | --- |
| | |
| | # Model Summary |
| |
|
| | This model enhances the reasoning capabilities of the small 1.5B parameter `DeepSeek-R1-Distill-Qwen-1.5B` LLM using reinforcement learning (RL). Trained efficiently on 4 A40 GPUs in under 24 hours, it achieves significant gains in mathematical reasoning benchmarks (e.g., 80% accuracy on AMC23, 46.7% on AIME24, surpassing `o1-preview`). This cost-effective approach demonstrates the potential of RL for boosting reasoning in resource-constrained settings. |
| |
|
| |
|
| | ## Evaluation |
| | ### Performance Highlights |
| | - **Open-RS1**: 53.0% avg. score |
| | - **Open-RS2**: 55.7% avg. score, 80.0% on AMC23 |
| | - **Open-RS3**: 56.3% avg. score, 46.7% on AIME24 (outperforms `o1-preview` at 44.6%) |
| | - Competitive MATH-500 scores; Minerva lags behind 7B models. |
| |
|
| |  |
| |
|
| | ### Cost Efficiency |
| | Our approach uses 7,000 samples (42,000 total outputs) and costs ~$42 on 4x A40 GPUs in 24 hours, compared to thousands of dollars for baseline models. |
| |
|
| |  |
| |  |
| |
|
| |
|
| | ## Citation |
| | If this project aids your work, please cite it as: |
| | ``` |
| | @misc{dang2025reinforcementlearningreasoningsmall, |
| | title={Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't}, |
| | author={Quy-Anh Dang and Chris Ngo}, |
| | year={2025}, |
| | eprint={2503.16219}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.LG}, |
| | url={https://arxiv.org/abs/2503.16219}, |
| | } |
| | ``` |
| |
|
| | For more details, including usage instructions and further evaluation results, please refer to our [GitHub repository](https://github.com/knoveleng/open-rs). |