Safetensors
qwen2
Skywork-OR1-Math-7B / README.md
chrisliu298's picture
Update README.md
d89989a verified
|
raw
history blame
6.76 kB
metadata
base_model:
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

🤔 Skywork-OR1 (Open Reasoner 1)

✊ Unleashing the Power of Reinforcement Learning for Math and Code Reasoners 🤖

Models Data Github Notion

GitHub Stars GitHub Forks

🔥 News

  • April 13, 2025: We release the Skywork-OR1 (Open Reasoner 1) series of models, including Skywork-OR1-Math-7B, Skywork-OR1-32B-Preview, and Skywork-OR1-7B-Preview. We open-source

📖 Overview

The AIME24 scores versus training steps of Skywork-OR1-Math-7B in our multi-stage training pipeline.

The Skywork-OR1 (Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. This series includes two general-purpose reasoning modelsl, Skywork-OR1-7B-Preview and Skywork-OR1-32B-Preview, along with a math-specialized model, Skywork-OR1-Math-7B.

  • Skywork-OR1-Math-7B is specifically optimized for mathematical reasoning, scoring 69.8 on AIME24 and 52.3 on AIME25 — well ahead of all models of similar size.
  • Skywork-OR1-32B-Preview delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).
  • Skywork-OR1-7B-Preview outperforms all similarly sized models in both math and coding scenarios.

📊 Evaluation

We evaluate our models on AIME24, AIME25, and LiveCodeBench. Instead of using Pass@1, which is common in prior work, we introduce Avg@K as the primary metric. This metric robustly measures a model's average performance across K independent attempts, reducing the impact of randomness and enhancing the reliability of the results. We believe that Avg@K provides a better reflection of a model's stability and reasoning consistency.

We include the detailed results in the following table.

Model AIME24 (Avg@32) AIME25 (Avg@32) LiveCodeBench (8/1/24-2/1/25) (Avg@4)
DeepSeek-R1-Distill-Qwen-7B 55.5 39.2 37.6
Light-R1-7B-DS 59.1 44.3 39.5
DeepSeek-R1-Distill-Qwen-32B 72.9 59.0 57.2
TinyR1-32B-Preview 78.1 65.3 61.6
QwQ-32B 79.5 65.3 61.6
DeepSeek-R1 79.8 70.0 65.9
Skywork-OR1-Math-7B 69.8 52.3 43.6
Skywork-OR1-7B-Preview 63.6 45.8 43.9
Skywork-OR1-32B-Preview 79.7 69.0 63.9

⚙️ Training Recipe

We offer a brief overview of our data and training pipeline below. For more details, please refer to our Notion Blog here.

Data

  • We select, clean, and curate a dataset of 110K verifiable, challenging, and diverse math problems and 14K coding questions from open-source datasets.
  • We perform model-aware difficulty estimation for each problem and model and conduct rigorous quality assessment prior to training to ensure training efficiency and effectiveness.

Training

We develop a customized version of GRPO that leverages both data-wise and training-wise improvements:

  • We perform both offline and online difficulty-based filtering and rejection sampling to improve training efficiency.
  • We incorporate a multi-stage training pipeline coupled with adaptive entropy control and other techniques to enhance exploration and stability.

📄 Technical Report

Our technical report will be released soon. Stay tuned!

🙏 Acknowledgements

📚 Citation

We will update the citation once the technical report is released. In the meantime, please cite the following:

@misc{skywork-or1-2025,
  title={},
  author={},
  howpublished={\url{https://capricious-hydrogen-41c.notion.site/Skywork-Open-Reaonser-Series-1d0bc9ae823a80459b46c149e4f51680}},
  note={Notion Blog},
  year={2025}
}