open-r1
/

OpenR1-Distill-7B

Text Generation

text-generation-inference

Model card Files Files and versions

lewtun HF Staff commited on May 22, 2025

Commit

83969e1

·

verified ·

1 Parent(s): 1050437

Update README.md

Files changed (1) hide show

README.md +7 -18

README.md CHANGED Viewed

@@ -13,7 +13,9 @@ library_name: transformers
 # OpenR1-Distill-7B
-OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) on around 350k reasoning traces distilled from R1 in the domains of mathematics, coding, and science. This model matches or exceeds the performance of DeepSeek's distilled model,
 ## Model description
@@ -34,23 +36,10 @@ OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://hu
 At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
-| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
-|-------------|-----|----|---------------|--------------|
-| StableLM-Tuned-α | 7B| dSFT |2.75| -|
-| MPT-Chat |  7B |dSFT |5.42| -|
-| Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
-| Mistral-Instructv0.1 | 7B|  - | 6.84 |-|
-| Zephyr-7b-α |7B|  dDPO| 6.88| -|
-| **Zephyr-7b-β** 🪁 | **7B** | **dDPO** | **7.34** | **90.60** |
-| Falcon-Instruct |  40B |dSFT |5.17 |45.71|
-| Guanaco | 65B |  SFT |6.41| 71.80|
-| Llama2-Chat |  70B |RLHF |6.86| 92.66|
-| Vicuna v1.3 |  33B |dSFT |7.12 |88.99|
-| WizardLM v1.0 |  70B |dSFT |7.71 |-|
-| Xwin-LM v0.1 |   70B |dPPO |- |95.57|
-| GPT-3.5-turbo | - |RLHF |7.94 |89.37|
-| Claude 2 |  - |RLHF |8.06| 91.36|
-| GPT-4 |  -| RLHF |8.99| 95.28|
 In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:

 # OpenR1-Distill-7B
+OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) that was trained to reproduce the performance of [DeepSeek's 7B distilled model](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
+on around 350k reasoning traces distilled from R1 in the domains of mathematics, coding, and science. This model matches or exceeds the performance of DeepSeek's distilled model,
 ## Model description
 At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
+| Model | AIME 2024 | MATH-500 | GPQA-D | LiveCodeBench |
+| :---- | :----: | :----: | :----: | :----: |
+| OpenR1-Distill-7B | 52.66 | 89 | 52.78 | X |
+| DeepSeek-R1-Distill-Qwen-7B | 51.25 | 93.45 | 52.4 | 37.41 |
 In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B: