Lingaaaaaaa commited on
Commit
0cb33c5
·
verified ·
1 Parent(s): 8f51f2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -35,13 +35,12 @@ Revolutionary template-augmented reasoning paradigm enpowers a 32B model to outp
35
  We present the evaluation results of our ReasonFlux-F1-32B on challenging reasoning tasks including AIME2024,AIM2025,MATH500 and GPQA-Diamond. To make a fair comparison, we report the results of the LLMs on our evaluation scripts in [ReasonFlux-F1]().
36
 
37
  | Model | AIME2024@pass1 | AIME2025@pass1 | MATH500@pass1 | GPQA@pass1 |
38
- | --------------------------------------- | -------------- | -------------- | ------------- | ---------- |
39
  | QwQ-32B-Preview | 46.7 | 37.2 | 90.6 | 65.2 |
40
- | LIMO-32B | 56.3 | 44.5 | 94.80 | 58.08 |
41
  | s1-32B | 56.7 | 49.3 | 93.0 | 59.6 |
42
- | OpenThinker-32B | 66.0 | 53.3 | 94.8 | 60.10 |
43
- | FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview | 76.67 | 40.0 | 93.4 | 59.09 |
44
- | R1-Distill-32B | 70 | 46.67 | 92 | 59.59 |
45
  | ReasonFlux-Zero-32B | 56.7 | 37.2 | 91.2 | 61.2 |
46
  | **ReasonFlux-F1-32B** | **76.7** | **53.3** | **96.0** | **67.2** |
47
 
 
35
  We present the evaluation results of our ReasonFlux-F1-32B on challenging reasoning tasks including AIME2024,AIM2025,MATH500 and GPQA-Diamond. To make a fair comparison, we report the results of the LLMs on our evaluation scripts in [ReasonFlux-F1]().
36
 
37
  | Model | AIME2024@pass1 | AIME2025@pass1 | MATH500@pass1 | GPQA@pass1 |
38
+ | --------------------------------------- | :--------------: | :--------------: | :-------------: | :----------: |
39
  | QwQ-32B-Preview | 46.7 | 37.2 | 90.6 | 65.2 |
40
+ | LIMO-32B | 56.3 | 44.5 | 94.8 | 58.1 |
41
  | s1-32B | 56.7 | 49.3 | 93.0 | 59.6 |
42
+ | OpenThinker-32B | 66.0 | 53.3 | 94.8 | 60.1 |
43
+ | R1-Distill-32B | 70.0 | 46.7 | 92.0 | 59.6 |
 
44
  | ReasonFlux-Zero-32B | 56.7 | 37.2 | 91.2 | 61.2 |
45
  | **ReasonFlux-F1-32B** | **76.7** | **53.3** | **96.0** | **67.2** |
46