Text Generation
Transformers
Safetensors
bailing_moe
conversational
custom_code
zzqsmall commited on
Commit
611f7d6
·
verified ·
1 Parent(s): f33e0e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -97,11 +97,11 @@ It currently stands as the **best open-source flagship non-thinking model**, riv
97
  | | MMLU-Redux (EM) | 92.37 | 91.58 | **92.75** | __<span style="color:red">94.67</span>__ | 92.25 |
98
  | | MMLU-Pro | __<span style="color:red">83.25</span>__ | 81.03 | 81.94 | **82.13** | 82.04 |
99
  | **Knowledge** | **STEM** | | | | | |
100
- | | MMLU-Pro-Stem | 87.91 | 85.30 | 73.45 | __<span style="color:red">88.60</span> | **88.5** |
101
  | | OlympiadBench-stem | 87.83 | 79.13 | 78.26 | **89.57** | __<span style="color:red">91.3</span>__ |
102
  | | GPQA-Diamond | __<span style="color:red">76.23</span>__ | **73.93** | 71.31 | 71.81 | 72.98 |
103
  | **Coding** | **Code Generation** | | | | | |
104
- | | MultiPL-E | **77.68** | 73.76 | 71.48 | 71.48 | __<span style="color:red">77.91</span>__ |
105
  | | mbpp | 90.69 | 89.96 | **91.72** | 91.01 | __<span style="color:red">96.87</span>__ |
106
  | | LiveCodeBench (2408-2505) | 48.02 | 48.95 | **48.57** | 45.43 | __<span style="color:red">61.68</span>__ |
107
  | | CodeForces-rating | 1582 | 1574 | 1120 | **1675** | __<span style="color:red">1901</span>__ |
 
97
  | | MMLU-Redux (EM) | 92.37 | 91.58 | **92.75** | __<span style="color:red">94.67</span>__ | 92.25 |
98
  | | MMLU-Pro | __<span style="color:red">83.25</span>__ | 81.03 | 81.94 | **82.13** | 82.04 |
99
  | **Knowledge** | **STEM** | | | | | |
100
+ | | MMLU-Pro-Stem | 87.91 | 85.30 | 73.45 | __<span style="color:red">88.60</span>__ | **88.5** |
101
  | | OlympiadBench-stem | 87.83 | 79.13 | 78.26 | **89.57** | __<span style="color:red">91.3</span>__ |
102
  | | GPQA-Diamond | __<span style="color:red">76.23</span>__ | **73.93** | 71.31 | 71.81 | 72.98 |
103
  | **Coding** | **Code Generation** | | | | | |
104
+ | | MultiPL-E | **77.68** | 73.76 | 76.66 | 71.48 | __<span style="color:red">77.91</span>__ |
105
  | | mbpp | 90.69 | 89.96 | **91.72** | 91.01 | __<span style="color:red">96.87</span>__ |
106
  | | LiveCodeBench (2408-2505) | 48.02 | 48.95 | **48.57** | 45.43 | __<span style="color:red">61.68</span>__ |
107
  | | CodeForces-rating | 1582 | 1574 | 1120 | **1675** | __<span style="color:red">1901</span>__ |