Update README.md
Browse files
README.md
CHANGED
|
@@ -97,11 +97,11 @@ It currently stands as the **best open-source flagship non-thinking model**, riv
|
|
| 97 |
| | MMLU-Redux (EM) | 92.37 | 91.58 | **92.75** | __<span style="color:red">94.67</span>__ | 92.25 |
|
| 98 |
| | MMLU-Pro | __<span style="color:red">83.25</span>__ | 81.03 | 81.94 | **82.13** | 82.04 |
|
| 99 |
| **Knowledge** | **STEM** | | | | | |
|
| 100 |
-
| | MMLU-Pro-Stem | 87.91 | 85.30 | 73.45 | __<span style="color:red">88.60</span> | **88.5** |
|
| 101 |
| | OlympiadBench-stem | 87.83 | 79.13 | 78.26 | **89.57** | __<span style="color:red">91.3</span>__ |
|
| 102 |
| | GPQA-Diamond | __<span style="color:red">76.23</span>__ | **73.93** | 71.31 | 71.81 | 72.98 |
|
| 103 |
| **Coding** | **Code Generation** | | | | | |
|
| 104 |
-
| | MultiPL-E | **77.68** | 73.76 |
|
| 105 |
| | mbpp | 90.69 | 89.96 | **91.72** | 91.01 | __<span style="color:red">96.87</span>__ |
|
| 106 |
| | LiveCodeBench (2408-2505) | 48.02 | 48.95 | **48.57** | 45.43 | __<span style="color:red">61.68</span>__ |
|
| 107 |
| | CodeForces-rating | 1582 | 1574 | 1120 | **1675** | __<span style="color:red">1901</span>__ |
|
|
|
|
| 97 |
| | MMLU-Redux (EM) | 92.37 | 91.58 | **92.75** | __<span style="color:red">94.67</span>__ | 92.25 |
|
| 98 |
| | MMLU-Pro | __<span style="color:red">83.25</span>__ | 81.03 | 81.94 | **82.13** | 82.04 |
|
| 99 |
| **Knowledge** | **STEM** | | | | | |
|
| 100 |
+
| | MMLU-Pro-Stem | 87.91 | 85.30 | 73.45 | __<span style="color:red">88.60</span>__ | **88.5** |
|
| 101 |
| | OlympiadBench-stem | 87.83 | 79.13 | 78.26 | **89.57** | __<span style="color:red">91.3</span>__ |
|
| 102 |
| | GPQA-Diamond | __<span style="color:red">76.23</span>__ | **73.93** | 71.31 | 71.81 | 72.98 |
|
| 103 |
| **Coding** | **Code Generation** | | | | | |
|
| 104 |
+
| | MultiPL-E | **77.68** | 73.76 | 76.66 | 71.48 | __<span style="color:red">77.91</span>__ |
|
| 105 |
| | mbpp | 90.69 | 89.96 | **91.72** | 91.01 | __<span style="color:red">96.87</span>__ |
|
| 106 |
| | LiveCodeBench (2408-2505) | 48.02 | 48.95 | **48.57** | 45.43 | __<span style="color:red">61.68</span>__ |
|
| 107 |
| | CodeForces-rating | 1582 | 1574 | 1120 | **1675** | __<span style="color:red">1901</span>__ |
|