Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,9 @@ library_name: transformers
|
|
| 13 |
|
| 14 |
# OpenR1-Distill-7B
|
| 15 |
|
| 16 |
-
OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
|
|
|
|
|
|
|
| 17 |
|
| 18 |
## Model description
|
| 19 |
|
|
@@ -34,23 +36,10 @@ OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://hu
|
|
| 34 |
|
| 35 |
At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
|
| 36 |
|
| 37 |
-
| Model |
|
| 38 |
-
|
| 39 |
-
|
|
| 40 |
-
|
|
| 41 |
-
| Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
|
| 42 |
-
| Mistral-Instructv0.1 | 7B| - | 6.84 |-|
|
| 43 |
-
| Zephyr-7b-α |7B| dDPO| 6.88| -|
|
| 44 |
-
| **Zephyr-7b-β** 🪁 | **7B** | **dDPO** | **7.34** | **90.60** |
|
| 45 |
-
| Falcon-Instruct | 40B |dSFT |5.17 |45.71|
|
| 46 |
-
| Guanaco | 65B | SFT |6.41| 71.80|
|
| 47 |
-
| Llama2-Chat | 70B |RLHF |6.86| 92.66|
|
| 48 |
-
| Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
|
| 49 |
-
| WizardLM v1.0 | 70B |dSFT |7.71 |-|
|
| 50 |
-
| Xwin-LM v0.1 | 70B |dPPO |- |95.57|
|
| 51 |
-
| GPT-3.5-turbo | - |RLHF |7.94 |89.37|
|
| 52 |
-
| Claude 2 | - |RLHF |8.06| 91.36|
|
| 53 |
-
| GPT-4 | -| RLHF |8.99| 95.28|
|
| 54 |
|
| 55 |
In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:
|
| 56 |
|
|
|
|
| 13 |
|
| 14 |
# OpenR1-Distill-7B
|
| 15 |
|
| 16 |
+
OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) that was trained to reproduce the performance of [DeepSeek's 7B distilled model](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
|
| 17 |
+
|
| 18 |
+
on around 350k reasoning traces distilled from R1 in the domains of mathematics, coding, and science. This model matches or exceeds the performance of DeepSeek's distilled model,
|
| 19 |
|
| 20 |
## Model description
|
| 21 |
|
|
|
|
| 36 |
|
| 37 |
At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
|
| 38 |
|
| 39 |
+
| Model | AIME 2024 | MATH-500 | GPQA-D | LiveCodeBench |
|
| 40 |
+
| :---- | :----: | :----: | :----: | :----: |
|
| 41 |
+
| OpenR1-Distill-7B | 52.66 | 89 | 52.78 | X |
|
| 42 |
+
| DeepSeek-R1-Distill-Qwen-7B | 51.25 | 93.45 | 52.4 | 37.41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:
|
| 45 |
|