hengm3467 commited on
Commit
fbcee77
·
1 Parent(s): 8817d4a

sync to v2 model card

Browse files

- benchmark name GPDVal -> GPDVal-AA
- Availability: add DeepInfra, Fireworks AI, Modal Labs as upcoming partners
- vLLM: swap naming so FP8 is the default (step3p7-flash) and BF16 is suffixed
- SGLang: add NVFP4 launch command
- capitalize FP8 / BF16 / NVFP4 labels
- Agent Platforms: drop Lemonade
- add License section

Files changed (1) hide show
  1. README.md +28 -10
README.md CHANGED
@@ -28,7 +28,7 @@ Execution reliability is critical for autonomous agents. Step 3.7 Flash leads th
28
 
29
  ### Code Engineering and Professional Baselines
30
 
31
- Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GPDVal (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.
32
 
33
  ![Step 3.7 Flash benchmark results across General Agent, Agentic Coding, and Multimodal evaluations](assets/benchmarks.png)
34
 
@@ -41,7 +41,7 @@ Step 3.7 Flash is built for live engineering tasks and secured a definitive seco
41
  | Output | $1.15 / M tokens |
42
 
43
  ## 4. Availability, Deployment, and Ecosystem
44
- - Availability: Step 3.7 Flash is available through StepFun Open Platform — [platform.stepfun.ai](https://platform.stepfun.ai) (Global) and [platform.stepfun.com](https://platform.stepfun.com) (China) — as well as partner platforms including OpenRouter and NVIDIA NIM.
45
  - Deployment: Step 3.7 Flash supports flexible deployment across cloud, data center, and local environments. For large-scale production and enterprise use cases, Step 3.7 Flash can be deployed on modern data center infrastructure. For local and workstation scenarios, it can also run on high-memory devices such as NVIDIA DGX Station, AMD Ryzen AI Max+ 395-based systems, and Mac Studio / Macbook Pro devices with at least 128GB unified memory.
46
  - Ecosystem: Step 3.7 Flash is supported across popular open-source infrastructure for both inference and model development. For inference and serving, developers can use vLLM, SGLang, Hugging Face Transformers, and llama.cpp. For model development workflows, StepFun model support has landed in the NVIDIA Megatron ecosystem, including Megatron Core and Megatron Bridge.
47
 
@@ -143,10 +143,10 @@ pip install -U vllm --pre \
143
 
144
  2. Launch the server.
145
 
146
- - For fp8 model
147
  ```bash
148
  vllm serve <MODEL_PATH_OR_HF_ID> \
149
- --served-model-name step3p7-flash-fp8 \
150
  --tensor-parallel-size 8 \
151
  --enable-expert-parallel \
152
  --disable-cascade-attn \
@@ -156,10 +156,10 @@ pip install -U vllm --pre \
156
  --speculative_config '{"method": "mtp", "num_speculative_tokens": 3}' \
157
  --trust-remote-code
158
  ```
159
- - For bf16 model
160
  ```bash
161
  vllm serve <MODEL_PATH_OR_HF_ID> \
162
- --served-model-name step3p7-flash \
163
  --tensor-parallel-size 8 \
164
  --enable-expert-parallel \
165
  --disable-cascade-attn \
@@ -170,7 +170,7 @@ pip install -U vllm --pre \
170
  --trust-remote-code
171
  ```
172
 
173
- - For nvfp4 model
174
  Compared to standard precisions, running the FP4 quantized version requires modelopt activation and FP8 KV Cache alignment.
175
  ```bash
176
  python3 -m vllm.entrypoints.openai.api_server \
@@ -207,7 +207,7 @@ pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git"
207
 
208
  > **Note:** For Blackwell GPUs, `--mm-attention-backend fa4` may be used.
209
 
210
- - For bf16 model
211
 
212
  ```bash
213
  sglang serve --model-path stepfun-ai/Step-3.7-Flash \
@@ -225,7 +225,7 @@ sglang serve --model-path stepfun-ai/Step-3.7-Flash \
225
  --port 8000
226
  ```
227
 
228
- - For fp8 model
229
 
230
  ```bash
231
  sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
@@ -244,6 +244,20 @@ sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
244
  --port 8000
245
  ```
246
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
247
  ### 6.3 Transformers (Debug / Verification)
248
 
249
  Use this snippet for quick functional verification. For high-throughput serving, use vLLM or SGLang.
@@ -377,7 +391,7 @@ cmake --build build-vulkan -j8
377
 
378
  ## 7. Using Step 3.7 Flash on Agent Platforms
379
 
380
- You can use Step 3.7 Flash on Agent platforms such as Hermes Agent, Lemonade, OpenClaw, Kilo Code, and more.
381
 
382
  ## 8. Getting in Touch
383
 
@@ -386,3 +400,7 @@ As we work to shape the future of AGI by expanding broad model capabilities, we
386
  - **Join the Conversation:** Our [Discord](https://discord.gg/RcMJhNVAQc) community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
387
  - **Report Friction:** Encountering limitations? You can open an issue or start a discussion on GitHub / HuggingFace, or flag it directly in our Discord support channels.
388
 
 
 
 
 
 
28
 
29
  ### Code Engineering and Professional Baselines
30
 
31
+ Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GPDVal-AA (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.
32
 
33
  ![Step 3.7 Flash benchmark results across General Agent, Agentic Coding, and Multimodal evaluations](assets/benchmarks.png)
34
 
 
41
  | Output | $1.15 / M tokens |
42
 
43
  ## 4. Availability, Deployment, and Ecosystem
44
+ - Availability: Step 3.7 Flash is available through StepFun Open Platform — [platform.stepfun.ai](https://platform.stepfun.ai) (Global) and [platform.stepfun.com](https://platform.stepfun.com) (China) — as well as partner platforms including OpenRouter and NVIDIA NIM. StepFun is also partnering with DeepInfra, Fireworks AI, and Modal Labs to expand availability soon.
45
  - Deployment: Step 3.7 Flash supports flexible deployment across cloud, data center, and local environments. For large-scale production and enterprise use cases, Step 3.7 Flash can be deployed on modern data center infrastructure. For local and workstation scenarios, it can also run on high-memory devices such as NVIDIA DGX Station, AMD Ryzen AI Max+ 395-based systems, and Mac Studio / Macbook Pro devices with at least 128GB unified memory.
46
  - Ecosystem: Step 3.7 Flash is supported across popular open-source infrastructure for both inference and model development. For inference and serving, developers can use vLLM, SGLang, Hugging Face Transformers, and llama.cpp. For model development workflows, StepFun model support has landed in the NVIDIA Megatron ecosystem, including Megatron Core and Megatron Bridge.
47
 
 
143
 
144
  2. Launch the server.
145
 
146
+ - For FP8 model
147
  ```bash
148
  vllm serve <MODEL_PATH_OR_HF_ID> \
149
+ --served-model-name step3p7-flash \
150
  --tensor-parallel-size 8 \
151
  --enable-expert-parallel \
152
  --disable-cascade-attn \
 
156
  --speculative_config '{"method": "mtp", "num_speculative_tokens": 3}' \
157
  --trust-remote-code
158
  ```
159
+ - For BF16 model
160
  ```bash
161
  vllm serve <MODEL_PATH_OR_HF_ID> \
162
+ --served-model-name step3p7-flash-bf16 \
163
  --tensor-parallel-size 8 \
164
  --enable-expert-parallel \
165
  --disable-cascade-attn \
 
170
  --trust-remote-code
171
  ```
172
 
173
+ - For NVFP4 model
174
  Compared to standard precisions, running the FP4 quantized version requires modelopt activation and FP8 KV Cache alignment.
175
  ```bash
176
  python3 -m vllm.entrypoints.openai.api_server \
 
207
 
208
  > **Note:** For Blackwell GPUs, `--mm-attention-backend fa4` may be used.
209
 
210
+ - For BF16 model
211
 
212
  ```bash
213
  sglang serve --model-path stepfun-ai/Step-3.7-Flash \
 
225
  --port 8000
226
  ```
227
 
228
+ - For FP8 model
229
 
230
  ```bash
231
  sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
 
244
  --port 8000
245
  ```
246
 
247
+ - For NVFP4 model
248
+
249
+ ```bash
250
+ sglang serve --model-path stepfun-ai/Step-3.7-Flash-NVFP4 \
251
+ --tp 4 --ep 4 \
252
+ --moe-runner-backend flashinfer_trtllm \
253
+ --kv-cache-dtype fp8_e4m3 \
254
+ --quantization modelopt_fp4 \
255
+ --trust-remote-code \
256
+ --reasoning-parser step3p5 \
257
+ --tool-call-parser step3p5 \
258
+ --attention-backend trtllm_mha
259
+ ```
260
+
261
  ### 6.3 Transformers (Debug / Verification)
262
 
263
  Use this snippet for quick functional verification. For high-throughput serving, use vLLM or SGLang.
 
391
 
392
  ## 7. Using Step 3.7 Flash on Agent Platforms
393
 
394
+ You can use Step 3.7 Flash on Agent platforms such as Hermes Agent, OpenClaw, Kilo Code, and more.
395
 
396
  ## 8. Getting in Touch
397
 
 
400
  - **Join the Conversation:** Our [Discord](https://discord.gg/RcMJhNVAQc) community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
401
  - **Report Friction:** Encountering limitations? You can open an issue or start a discussion on GitHub / HuggingFace, or flag it directly in our Discord support channels.
402
 
403
+ ## 📄 License
404
+
405
+ This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
406
+