zhouhy commited on
Commit ·
d68b516
1
Parent(s): c375909
Update README training codebase info
Browse files
README.md
CHANGED
|
@@ -29,7 +29,7 @@ library_name: transformers
|
|
| 29 |
|
| 30 |
## 1. Introduction
|
| 31 |
|
| 32 |
-
**Step 3.5 Flash** ([visit website](https://static.stepfun.com/blog/step-3.5-flash/)) is our most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. This "intelligence density" allows it to rival the reasoning depth of top-tier proprietary models, while maintaining the agility required for real-time interaction.
|
| 33 |
|
| 34 |
## 2. Key Capabilities
|
| 35 |
|
|
@@ -113,6 +113,10 @@ Unlike traditional dense models, Step 3.5 Flash uses a fine-grained routing stra
|
|
| 113 |
|
| 114 |
To improve inference speed, we utilize a specialized MTP Head consisting of a sliding-window attention mechanism and a dense Feed-Forward Network (FFN). This module predicts 4 tokens simultaneously in a single forward pass, significantly accelerating inference without degrading quality.
|
| 115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
## 📜 Citation
|
| 118 |
|
|
@@ -131,4 +135,4 @@ If you find this project useful in your research, please cite our technical repo
|
|
| 131 |
```
|
| 132 |
|
| 133 |
## License
|
| 134 |
-
This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|
|
|
|
| 29 |
|
| 30 |
## 1. Introduction
|
| 31 |
|
| 32 |
+
**Step 3.5 Flash** ([visit website](https://static.stepfun.com/blog/step-3.5-flash/)) is our most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency. We also open-sourced the training codebase, with support for continue pretrain, SFT, RL (WIP), and evaluation (WIP), and will open-source the SFT data. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. This "intelligence density" allows it to rival the reasoning depth of top-tier proprietary models, while maintaining the agility required for real-time interaction.
|
| 33 |
|
| 34 |
## 2. Key Capabilities
|
| 35 |
|
|
|
|
| 113 |
|
| 114 |
To improve inference speed, we utilize a specialized MTP Head consisting of a sliding-window attention mechanism and a dense Feed-Forward Network (FFN). This module predicts 4 tokens simultaneously in a single forward pass, significantly accelerating inference without degrading quality.
|
| 115 |
|
| 116 |
+
## 5. Training Codebase
|
| 117 |
+
|
| 118 |
+
The training codebase for Step 3.5 Flash is available at [SteptronOss](https://github.com/stepfun-ai/SteptronOss).
|
| 119 |
+
|
| 120 |
|
| 121 |
## 📜 Citation
|
| 122 |
|
|
|
|
| 135 |
```
|
| 136 |
|
| 137 |
## License
|
| 138 |
+
This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|