zhouhy commited on
Commit
d68b516
·
1 Parent(s): c375909

Update README training codebase info

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -29,7 +29,7 @@ library_name: transformers
29
 
30
  ## 1. Introduction
31
 
32
- **Step 3.5 Flash** ([visit website](https://static.stepfun.com/blog/step-3.5-flash/)) is our most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. This "intelligence density" allows it to rival the reasoning depth of top-tier proprietary models, while maintaining the agility required for real-time interaction.
33
 
34
  ## 2. Key Capabilities
35
 
@@ -113,6 +113,10 @@ Unlike traditional dense models, Step 3.5 Flash uses a fine-grained routing stra
113
 
114
  To improve inference speed, we utilize a specialized MTP Head consisting of a sliding-window attention mechanism and a dense Feed-Forward Network (FFN). This module predicts 4 tokens simultaneously in a single forward pass, significantly accelerating inference without degrading quality.
115
 
 
 
 
 
116
 
117
  ## 📜 Citation
118
 
@@ -131,4 +135,4 @@ If you find this project useful in your research, please cite our technical repo
131
  ```
132
 
133
  ## License
134
- This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
 
29
 
30
  ## 1. Introduction
31
 
32
+ **Step 3.5 Flash** ([visit website](https://static.stepfun.com/blog/step-3.5-flash/)) is our most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency. We also open-sourced the training codebase, with support for continue pretrain, SFT, RL (WIP), and evaluation (WIP), and will open-source the SFT data. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. This "intelligence density" allows it to rival the reasoning depth of top-tier proprietary models, while maintaining the agility required for real-time interaction.
33
 
34
  ## 2. Key Capabilities
35
 
 
113
 
114
  To improve inference speed, we utilize a specialized MTP Head consisting of a sliding-window attention mechanism and a dense Feed-Forward Network (FFN). This module predicts 4 tokens simultaneously in a single forward pass, significantly accelerating inference without degrading quality.
115
 
116
+ ## 5. Training Codebase
117
+
118
+ The training codebase for Step 3.5 Flash is available at [SteptronOss](https://github.com/stepfun-ai/SteptronOss).
119
+
120
 
121
  ## 📜 Citation
122
 
 
135
  ```
136
 
137
  ## License
138
+ This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).