lewtun HF Staff commited on
Commit
83969e1
·
verified ·
1 Parent(s): 1050437

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -18
README.md CHANGED
@@ -13,7 +13,9 @@ library_name: transformers
13
 
14
  # OpenR1-Distill-7B
15
 
16
- OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) on around 350k reasoning traces distilled from R1 in the domains of mathematics, coding, and science. This model matches or exceeds the performance of DeepSeek's distilled model,
 
 
17
 
18
  ## Model description
19
 
@@ -34,23 +36,10 @@ OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://hu
34
 
35
  At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
36
 
37
- | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
38
- |-------------|-----|----|---------------|--------------|
39
- | StableLM-Tuned-α | 7B| dSFT |2.75| -|
40
- | MPT-Chat | 7B |dSFT |5.42| -|
41
- | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
42
- | Mistral-Instructv0.1 | 7B| - | 6.84 |-|
43
- | Zephyr-7b-α |7B| dDPO| 6.88| -|
44
- | **Zephyr-7b-β** 🪁 | **7B** | **dDPO** | **7.34** | **90.60** |
45
- | Falcon-Instruct | 40B |dSFT |5.17 |45.71|
46
- | Guanaco | 65B | SFT |6.41| 71.80|
47
- | Llama2-Chat | 70B |RLHF |6.86| 92.66|
48
- | Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
49
- | WizardLM v1.0 | 70B |dSFT |7.71 |-|
50
- | Xwin-LM v0.1 | 70B |dPPO |- |95.57|
51
- | GPT-3.5-turbo | - |RLHF |7.94 |89.37|
52
- | Claude 2 | - |RLHF |8.06| 91.36|
53
- | GPT-4 | -| RLHF |8.99| 95.28|
54
 
55
  In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:
56
 
 
13
 
14
  # OpenR1-Distill-7B
15
 
16
+ OpenR1-Distill-7B is a post-trained version of [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) that was trained to reproduce the performance of [DeepSeek's 7B distilled model](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
17
+
18
+ on around 350k reasoning traces distilled from R1 in the domains of mathematics, coding, and science. This model matches or exceeds the performance of DeepSeek's distilled model,
19
 
20
  ## Model description
21
 
 
36
 
37
  At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
38
 
39
+ | Model | AIME 2024 | MATH-500 | GPQA-D | LiveCodeBench |
40
+ | :---- | :----: | :----: | :----: | :----: |
41
+ | OpenR1-Distill-7B | 52.66 | 89 | 52.78 | X |
42
+ | DeepSeek-R1-Distill-Qwen-7B | 51.25 | 93.45 | 52.4 | 37.41 |
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:
45