Randomize commited on
Commit
e9b865b
·
verified ·
1 Parent(s): d68b516

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -33
README.md CHANGED
@@ -1,17 +1,15 @@
1
  ---
2
  license: apache-2.0
3
- base_model:
4
- - stepfun-ai/step-3.5-flash-base
5
  library_name: transformers
6
  ---
7
 
8
- # Step 3.5 Flash Base
9
 
10
  <div align="center">
11
 
12
  <div align="center" style="display: flex; justify-content: center; align-items: center;">
13
  <img src="stepfun.svg" width="25" style="margin-right: 10px;"/>
14
- <h1 style="margin: 0; border-bottom: none;">Step 3.5 Flash</h1>
15
  </div>
16
 
17
  [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/stepfun-ai/SteptronOss)
@@ -51,34 +49,35 @@ Performance of Step 3.5 Flash measured across **Reasoning**, **Coding**, and **A
51
 
52
  ### Detailed Benchmarks
53
 
54
- | Benchmark | # Shots | Step3.5 Flash (Base) | MiMo‑V2 Flash (Base) | GLM‑4.5 (Base) | DeepSeek V3.1 (Base) | DeepSeek V3.2 (Exp Base) | Kimi‑K2 (Base) |
55
- | --- | --- | --- | --- | --- | --- | --- | --- |
56
- | # Activated Params | - | 11B | 15B | 32B | 37B | 37B | 32B |
57
- | # Total Params | - | 196B | 309B | 355B | 671B | 671B | 1043B |
58
- | General | | | | | | | |
59
- | BBH | 3-shot | 88.2 | 88.5 | 86.2 | 88.2† | 88.7† | 88.7 |
60
- | MMLU | 5-shot | 85.8 | 86.7 | 86.1 | 87.4† | 87.8† | 87.8 |
61
- | MMLU‑Redux | 5-shot | 89.2 | 90.6 | - | 90.0† | 90.4† | 90.2 |
62
- | MMLU‑Pro | 5-shot | 62.3 | 73.2 | - | 58.8† | 62.1† | 69.2 |
63
- | HellaSwag | 10-shot | 90.2 | 88.5 | 87.1 | 89.2† | 89.4† | 94.6 |
64
- | WinoGrande | 5-shot | 79.1 | 83.8 | - | 85.9† | 85.6† | 85.3 |
65
- | GPQA | 5-shot | 41.7 | 43.5* | 33.5* | 43.1* | 37.3* | 43.1* |
66
- | SuperGPQA | 5-shot | 41.0 | 41.1 | - | 42.3† | 43.6† | 44.7 |
67
- | SimpleQA | 5-shot | 31.6 | 20.6 | 30.0 | 26.3† | 27.0† | 35.3 |
68
- | Mathematics | | | | | | | |
69
- | GSM8K | 8-shot | 88.2 | 92.3 | 87.6 | 91.4† | 91.1† | 92.1 |
70
- | MATH | 4-shot | 66.8 | 71.0 | 62.6 | 62.6† | 62.5† | 70.2 |
71
- | Code | | | | | | | |
72
- | HumanEval | 3-shot | 81.1 | 77.4* | 79.8* | 72.5* | 67.7* | 84.8* |
73
- | MBPP | 3-shot | 79.4 | 81.0* | 81.6* | 74.6* | 75.6* | 89.0* |
74
- | HumanEval+ | 0-shot | 72.0 | 70.7 | - | 64.6 | 67.7† | - |
75
- | MBPP+ | 0-shot | 70.6 | 71.4 | - | 72.2† | 69.8† | - |
76
- | MultiPL‑E HumanEval | 0-shot | 67.7 | 59.5 | - | 45.9† | 45.7† | 60.5 |
77
- | MultiPL‑E MBPP | 0-shot | 58.0 | 56.7 | - | 52.5† | 50.6† | 58.8 |
78
- | Chinese | | | | | | | |
79
- | C‑EVAL | 5-shot | 89.6 | 87.9 | 86.9 | 90.0† | 91.0† | 92.5 |
80
- | CMMLU | 5-shot | 88.9 | 87.4 | - | 88.8† | 88.9† | 90.9 |
81
- | C‑SimpleQA | 5-shot | 63.2 | 61.5 | 70.1 | 70.9† | 68.0† | 77.6 |
 
82
 
83
  1. “*” denotes cases where the original score was unavailable; we report results evaluated under the same test conditions as Step3.5 Flash for fair
84
  comparison.
@@ -135,4 +134,4 @@ If you find this project useful in your research, please cite our technical repo
135
  ```
136
 
137
  ## License
138
- This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
 
1
  ---
2
  license: apache-2.0
 
 
3
  library_name: transformers
4
  ---
5
 
6
+ # Step 3.5 Flash Base Midtrain
7
 
8
  <div align="center">
9
 
10
  <div align="center" style="display: flex; justify-content: center; align-items: center;">
11
  <img src="stepfun.svg" width="25" style="margin-right: 10px;"/>
12
+ <h1 style="margin: 0; border-bottom: none;">Step 3.5 Flash Base Midtrain</h1>
13
  </div>
14
 
15
  [![GitHub](https://img.shields.io/badge/GitHub-181717?style=flat&logo=github&logoColor=white)](https://github.com/stepfun-ai/SteptronOss)
 
49
 
50
  ### Detailed Benchmarks
51
 
52
+
53
+ | Benchmark | # Shots | Step3.5 Flash (Base Midtrain) | Step3.5 Flash (Base) | MiMo‑V2 Flash (Base) | GLM‑4.5 (Base) | DeepSeek V3.1 (Base) | DeepSeekV3.2 (Exp Base) | Kimi‑K2 (Base) |
54
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
55
+ | # Activated Params | - | 11B | 11B | 15B | 32B | 37B | 37B | 32B |
56
+ | # Total Params | - | 196B | 196B | 309B | 355B | 671B | 671B | 1043B |
57
+ | General | | | | | | | | |
58
+ | BBH | 3-shot | 87.57 | 88.2 | 88.5 | 86.2 | 88.2† | 88.7† | 88.7 |
59
+ | MMLU | 5-shot | 83.52 | 85.8 | 86.7 | 86.1 | 87.4† | 87.8† | 87.8 |
60
+ | MMLU‑Redux | 5-shot | 87.22 | 89.2 | 90.6 | - | 90.0† | 90.4† | 90.2 |
61
+ | MMLU‑Pro | 5-shot | 60.41 | 62.3 | 73.2 | - | 58.8† | 62.1† | 69.2 |
62
+ | HellaSwag | 10-shot | - | 90.2 | 88.5 | 87.1 | 89.2† | 89.4† | 94.6 |
63
+ | WinoGrande | 5-shot | - | 79.1 | 83.8 | - | 85.9† | 85.6† | 85.3 |
64
+ | GPQA | 5-shot | - | 41.7 | 43.5* | 33.5* | 43.1* | 37.3* | 43.1* |
65
+ | SuperGPQA | 5-shot | - | 41.0 | 41.1 | - | 42.3† | 43.6† | 44.7 |
66
+ | SimpleQA | 5-shot | 28.25 | 31.6 | 20.6 | 30.0 | 26.3† | 27.0† | 35.3 |
67
+ | Mathematics | | | | | | | | |
68
+ | GSM8K | 8-shot | 88.40 | 88.2 | 92.3 | 87.6 | 91.4† | 91.1† | 92.1 |
69
+ | MATH | 4-shot | 65.40 | 66.8 | 71.0 | 62.6 | 62.6† | 62.5† | 70.2 |
70
+ | Code | | | | | | | | |
71
+ | HumanEval | 3-shot | 65.24 | 81.1 | 77.4* | 79.8* | 72.5* | 67.7* | 84.8* |
72
+ | MBPP | 3-shot | 79.20 | 79.4 | 81.0* | 81.6* | 74.6* | 75.6* | 89.0* |
73
+ | HumanEval+ | 0-shot | - | 72.0 | 70.7 | - | 64.6† | 67.7† | - |
74
+ | MBPP+ | 0-shot | - | 70.6 | 71.4 | - | 72.2† | 69.8† | - |
75
+ | MultiPL‑E HumanEval | 0-shot | - | 67.7 | 59.5 | - | 45.9† | 45.7† | 60.5 |
76
+ | MultiPL‑E MBPP | 0-shot | - | 58.0 | 56.7 | - | 52.5† | 50.6† | 58.8 |
77
+ | Chinese | | | | | | | | |
78
+ | C‑EVAL | 5-shot | 87.15 | 89.6 | 87.9 | 86.9 | 90.0† | 91.0† | 92.5 |
79
+ | CMMLU | 5-shot | 86.93 | 88.9 | 87.4 | - | 88.8† | 88.9† | 90.9 |
80
+ | C‑SimpleQA | 5-shot | - | 63.2 | 61.5 | 70.1 | 70.9† | 68.0† | 77.6 |
81
 
82
  1. “*” denotes cases where the original score was unavailable; we report results evaluated under the same test conditions as Step3.5 Flash for fair
83
  comparison.
 
134
  ```
135
 
136
  ## License
137
+ This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).