desaifan-mbzuai commited on
Commit
5df4dd1
verified
1 Parent(s): 7201bdc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -81,12 +81,12 @@ Below we report the evaluation results for K2-V2 after supervised fine-tuning (S
81
  | Metric / Model | **K2 Low**<br><sub>Dense 路 70B</sub> | **K2 Medium**<br><sub>Dense 路 70B</sub> | **K2 High**<br><sub>Dense 路 70B</sub> | **Olmo3 Think SFT**<br><sub>Dense 路 32B 路 No RL</sub> | **Olmo3 Think**<br><sub>Dense 路 32B 路 RL</sub> | **GLM-4.5 Air**<br><sub>MoE 路 106B A12B</sub> | **MiniMax-M2**<br><sub>MoE 路 230B A10B</sub> | **Qwen3 235B**<br><sub>MoE 路 235B A22B 路 Reasoning</sub> | **Qwen 2.5 72B**<br><sub>Dense 路 72B</sub> |
82
  |--------|--------------------------------------|------------------------------------------|----------------------------------------|------------------------------------------------------|--------------------------------------------------|----------------------------------------------------|------------------------------------------------------|--------------------------------------------------------------------|-------------------------------------------|
83
  | **LongBench V2** | 40.7 | 41.3 | 42.6 | 42.8 | 47.1 | 49.4 | 55.8 | 60.9 | 47.2 |
84
- | **AIME25** | 27.3 | 62.0 | 80.2 | 68.3 | 73.3 | 81.3 | 75.8 | 84.2 | 15.2 |
85
- | **HMMT25** | 19.0 | 45.6 | 71.4 | 43.3 | 50.83 | 73.3 | 63.5 | 93.5 | 9.79 |
86
- | **GSM8K** | 92.4 | 92.0 | 94.8 | 96.1 | 95.7 | 96.1 | 95.4 | 98.0 | 85.8 |
87
- | **Minerva** | 85.0 | 90.6 | 94.5 | 96.9 | 97.3 | 94.9 | 85.3 | 80.7 | 82.1 |
88
- | **GPQA-D** | 48.5 | 60.6 | 69.3 | 58.0 | 59.8 | 75.3 | 76.2 | 96.2 | 50.5 |
89
- | **MBPP** | 71.0 | 75.8 | 84.8 | 87.6 | 91.6 | 82.8 | 83.8 | 94.5 | 80.0 |
90
  | **HumanEval** | 82.3 | 91.5 | 91.5 | 96.3 | 96.3 | 97.6 | 89.6 | 94.5 | 85.4 |
91
  | **LCBv6** | 39.9 | 51.3 | 67.0 | 67.9 | 67.6 | 67.8 | 79.2 | 72.8 | 36.7 |
92
 
 
81
  | Metric / Model | **K2 Low**<br><sub>Dense 路 70B</sub> | **K2 Medium**<br><sub>Dense 路 70B</sub> | **K2 High**<br><sub>Dense 路 70B</sub> | **Olmo3 Think SFT**<br><sub>Dense 路 32B 路 No RL</sub> | **Olmo3 Think**<br><sub>Dense 路 32B 路 RL</sub> | **GLM-4.5 Air**<br><sub>MoE 路 106B A12B</sub> | **MiniMax-M2**<br><sub>MoE 路 230B A10B</sub> | **Qwen3 235B**<br><sub>MoE 路 235B A22B 路 Reasoning</sub> | **Qwen 2.5 72B**<br><sub>Dense 路 72B</sub> |
82
  |--------|--------------------------------------|------------------------------------------|----------------------------------------|------------------------------------------------------|--------------------------------------------------|----------------------------------------------------|------------------------------------------------------|--------------------------------------------------------------------|-------------------------------------------|
83
  | **LongBench V2** | 40.7 | 41.3 | 42.6 | 42.8 | 47.1 | 49.4 | 55.8 | 60.9 | 47.2 |
84
+ | **AIME25** | 27.3 | 62.0 | 80.2 | 68.3 | 73.3 | 81.3 | 75.8 | 88.8 | 15.2 |
85
+ | **HMMT25** | 19.0 | 45.6 | 71.4 | 43.3 | 50.83 | 73.3 | 63.5 | 84.2 | 9.79 |
86
+ | **GSM8K** | 92.4 | 92.0 | 94.8 | 96.1 | 95.7 | 96.1 | 95.4 | 93.5 | 85.8 |
87
+ | **Minerva** | 85.0 | 90.6 | 94.5 | 96.9 | 97.3 | 94.9 | 85.3 | 98.0 | 82.1 |
88
+ | **GPQA-D** | 48.5 | 60.6 | 69.3 | 58.0 | 59.8 | 75.3 | 76.2 | 80.7 | 50.5 |
89
+ | **MBPP** | 71.0 | 75.8 | 84.8 | 87.6 | 91.6 | 82.8 | 83.8 | 96.2 | 80.0 |
90
  | **HumanEval** | 82.3 | 91.5 | 91.5 | 96.3 | 96.3 | 97.6 | 89.6 | 94.5 | 85.4 |
91
  | **LCBv6** | 39.9 | 51.3 | 67.0 | 67.9 | 67.6 | 67.8 | 79.2 | 72.8 | 36.7 |
92