Update README.md
Browse files
README.md
CHANGED
|
@@ -105,28 +105,28 @@ Note:
|
|
| 105 |
| ------------------------- | --------------------------- | --------------- | --------------------- | ----------- | ------------------ | ------------- | -------- | ------------------ |
|
| 106 |
| | # Total Params | 46B | 8B | 8B | 8B | 12B | 14B | 30B |
|
| 107 |
| | # Activated Params | 2.5B | 8B | 8B | 8B | 12B | 14B | 3B |
|
| 108 |
-
| **English** | MMLU-Redux |
|
| 109 |
-
| | MMLU-Pro | 63.
|
| 110 |
-
| | GPQA-Diamoind |
|
| 111 |
-
| | SimpleQA | 6.52
|
| 112 |
-
| **Chinese** | CLUEWSC | 88.16
|
| 113 |
-
| | CEval | 83.99
|
| 114 |
-
| | C-SimpleQA | 42.3
|
| 115 |
-
| **Math & Reasoning** | MATH500 | 82.8
|
| 116 |
-
| | AIME24 | 25.62
|
| 117 |
-
| | AIME25 | 18.12
|
| 118 |
-
| **Code** | HumanEval | 87.8
|
| 119 |
-
| | HumanEval+ | 81.1
|
| 120 |
-
| | MBPPEvalplus | 83.1
|
| 121 |
-
| | MBPPEvalplus++ | 70.4
|
| 122 |
-
| | LiveCodeBench v5(2408-2501) | 28.67
|
| 123 |
-
| **Instruction Following** | IF-Eval | 80.04
|
| 124 |
-
| | Multi-IF(en+zh) | 78.73
|
| 125 |
-
| **Comprehensive Ability** | MTBench | 8.23
|
| 126 |
-
| | MT-Eval | 8.11
|
| 127 |
-
| | AlignBench v1.1 | 6.85
|
| 128 |
-
| | LiveBench 1125 | 50.1
|
| 129 |
-
| | Average | 53.
|
| 130 |
|
| 131 |
Note:
|
| 132 |
1. For InternLM3-8B-Instruct, the results marked with `*` are sourced from their public report, other evaluations are conducted based on internal evaluation frameworks.
|
|
|
|
| 105 |
| ------------------------- | --------------------------- | --------------- | --------------------- | ----------- | ------------------ | ------------- | -------- | ------------------ |
|
| 106 |
| | # Total Params | 46B | 8B | 8B | 8B | 12B | 14B | 30B |
|
| 107 |
| | # Activated Params | 2.5B | 8B | 8B | 8B | 12B | 14B | 3B |
|
| 108 |
+
| **English Understanding** | MMLU-Redux | 81.61 | 74.65 | 77.63 | 79.32 | 78.39 | 83.09 | 88.11 |
|
| 109 |
+
| | MMLU-Pro | 63.47 | 50.87 | 54.69 | 63.8 | 60.69 | 67.25 | 78.22 |
|
| 110 |
+
| | GPQA-Diamoind | 47.85 | 38.76 | 38.51 | 51.77 | 39.02 | 59.47 | 71.21 |
|
| 111 |
+
| | SimpleQA | 6.52 | 4.44 | 3.51 | 5.5 | 6.22 | 3.28 | 23.39 |
|
| 112 |
+
| **Chinese Understanding** | CLUEWSC | 88.16 | 77.63 | 81.91 | 82.89 | 91.12 | 88.16 | 92.11 |
|
| 113 |
+
| | CEval | 83.99 | 84.26 | 81.78 | 81.66 | 60.81 | 64.79 | 88.57 |
|
| 114 |
+
| | C-SimpleQA | 42.3 | 25.87 | 23.13 | 37.07 | 28.97 | 24.77 | 75.37 |
|
| 115 |
+
| **Math & Reasoning** | MATH500 | 82.8 | 68.4 | 79.8 | 85 | 86.8 | 80.6 | 97.2 |
|
| 116 |
+
| | AIME24 | 25.62 | 11.25 | 22.92 | 28.33 | 23.96 | 15.83 | 75 |
|
| 117 |
+
| | AIME25 | 18.12 | 8.12 | 15.21 | 20.62 | 18.33 | 18.75 | 61.88 |
|
| 118 |
+
| **Code** | HumanEval | 87.8 | 82.3* | 74.39 | 83.54 | 82.32 | 85.37 | 81.71 |
|
| 119 |
+
| | HumanEval+ | 81.1 | - | 70.12 | 76.83 | 75.61 | 83.54 | 76.83 |
|
| 120 |
+
| | MBPPEvalplus | 83.1 | 62.4 | 82 | 76.2 | 85.7 | 77.5 | 89.4 |
|
| 121 |
+
| | MBPPEvalplus++ | 70.4 | 50.4 | 69.3 | 66.1 | 74.1 | 66.7 | 75.1 |
|
| 122 |
+
| | LiveCodeBench v5(2408-2501) | 28.67 | 14.7 | 12.19 | 27.24 | 24.73 | 23.66 | 41.22 |
|
| 123 |
+
| **Instruction Following** | IF-Eval | 80.04 | 79.3 | 73.01 | 84.47 | 81.52 | 59.33 | 83.92 |
|
| 124 |
+
| | Multi-IF(en+zh) | 78.73 | 62.53 | 61.79 | 78.95 | 76.56 | 62.7 | 77.75 |
|
| 125 |
+
| **Comprehensive Ability** | MTBench | 8.23 | 7.86 | 6.875 | 8.21 | 8.675 | 8.625 | 9.33 |
|
| 126 |
+
| | MT-Eval | 8.11 | 7.36 | 6.7 | 8.18 | 8.45 | 8.12 | - |
|
| 127 |
+
| | AlignBench v1.1 | 6.85 | 6.13 | 5.99 | 6.95 | 6.3 | 6.33 | 7.06 |
|
| 128 |
+
| | LiveBench 1125 | 50.1 | 26.3 | 25.5 | 52.1 | 43.1 | 40 | 68.4 |
|
| 129 |
+
| | Average | 53.50 | - | 46.05 | 52.61 | 50.54 | 48.95 | - |
|
| 130 |
|
| 131 |
Note:
|
| 132 |
1. For InternLM3-8B-Instruct, the results marked with `*` are sourced from their public report, other evaluations are conducted based on internal evaluation frameworks.
|