Image-Text-to-Text
Transformers
Safetensors
English
step3p7
text-generation
vision-language
multimodal
Mixture of Experts
conversational
custom_code
Eval Results
Instructions to use stepfun-ai/Step-3.7-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use stepfun-ai/Step-3.7-Flash with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="stepfun-ai/Step-3.7-Flash", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("stepfun-ai/Step-3.7-Flash", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use stepfun-ai/Step-3.7-Flash with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "stepfun-ai/Step-3.7-Flash" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stepfun-ai/Step-3.7-Flash", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/stepfun-ai/Step-3.7-Flash
- SGLang
How to use stepfun-ai/Step-3.7-Flash with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "stepfun-ai/Step-3.7-Flash" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stepfun-ai/Step-3.7-Flash", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "stepfun-ai/Step-3.7-Flash" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stepfun-ai/Step-3.7-Flash", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use stepfun-ai/Step-3.7-Flash with Docker Model Runner:
docker model run hf.co/stepfun-ai/Step-3.7-Flash
hengm3467 commited on
Commit ·
fbcee77
1
Parent(s): 8817d4a
sync to v2 model card
Browse files- benchmark name GPDVal -> GPDVal-AA
- Availability: add DeepInfra, Fireworks AI, Modal Labs as upcoming partners
- vLLM: swap naming so FP8 is the default (step3p7-flash) and BF16 is suffixed
- SGLang: add NVFP4 launch command
- capitalize FP8 / BF16 / NVFP4 labels
- Agent Platforms: drop Lemonade
- add License section
README.md
CHANGED
|
@@ -28,7 +28,7 @@ Execution reliability is critical for autonomous agents. Step 3.7 Flash leads th
|
|
| 28 |
|
| 29 |
### Code Engineering and Professional Baselines
|
| 30 |
|
| 31 |
-
Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GPDVal (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.
|
| 32 |
|
| 33 |

|
| 34 |
|
|
@@ -41,7 +41,7 @@ Step 3.7 Flash is built for live engineering tasks and secured a definitive seco
|
|
| 41 |
| Output | $1.15 / M tokens |
|
| 42 |
|
| 43 |
## 4. Availability, Deployment, and Ecosystem
|
| 44 |
-
- Availability: Step 3.7 Flash is available through StepFun Open Platform — [platform.stepfun.ai](https://platform.stepfun.ai) (Global) and [platform.stepfun.com](https://platform.stepfun.com) (China) — as well as partner platforms including OpenRouter and NVIDIA NIM.
|
| 45 |
- Deployment: Step 3.7 Flash supports flexible deployment across cloud, data center, and local environments. For large-scale production and enterprise use cases, Step 3.7 Flash can be deployed on modern data center infrastructure. For local and workstation scenarios, it can also run on high-memory devices such as NVIDIA DGX Station, AMD Ryzen AI Max+ 395-based systems, and Mac Studio / Macbook Pro devices with at least 128GB unified memory.
|
| 46 |
- Ecosystem: Step 3.7 Flash is supported across popular open-source infrastructure for both inference and model development. For inference and serving, developers can use vLLM, SGLang, Hugging Face Transformers, and llama.cpp. For model development workflows, StepFun model support has landed in the NVIDIA Megatron ecosystem, including Megatron Core and Megatron Bridge.
|
| 47 |
|
|
@@ -143,10 +143,10 @@ pip install -U vllm --pre \
|
|
| 143 |
|
| 144 |
2. Launch the server.
|
| 145 |
|
| 146 |
-
- For
|
| 147 |
```bash
|
| 148 |
vllm serve <MODEL_PATH_OR_HF_ID> \
|
| 149 |
-
--served-model-name step3p7-flash
|
| 150 |
--tensor-parallel-size 8 \
|
| 151 |
--enable-expert-parallel \
|
| 152 |
--disable-cascade-attn \
|
|
@@ -156,10 +156,10 @@ pip install -U vllm --pre \
|
|
| 156 |
--speculative_config '{"method": "mtp", "num_speculative_tokens": 3}' \
|
| 157 |
--trust-remote-code
|
| 158 |
```
|
| 159 |
-
- For
|
| 160 |
```bash
|
| 161 |
vllm serve <MODEL_PATH_OR_HF_ID> \
|
| 162 |
-
--served-model-name step3p7-flash \
|
| 163 |
--tensor-parallel-size 8 \
|
| 164 |
--enable-expert-parallel \
|
| 165 |
--disable-cascade-attn \
|
|
@@ -170,7 +170,7 @@ pip install -U vllm --pre \
|
|
| 170 |
--trust-remote-code
|
| 171 |
```
|
| 172 |
|
| 173 |
-
- For
|
| 174 |
Compared to standard precisions, running the FP4 quantized version requires modelopt activation and FP8 KV Cache alignment.
|
| 175 |
```bash
|
| 176 |
python3 -m vllm.entrypoints.openai.api_server \
|
|
@@ -207,7 +207,7 @@ pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git"
|
|
| 207 |
|
| 208 |
> **Note:** For Blackwell GPUs, `--mm-attention-backend fa4` may be used.
|
| 209 |
|
| 210 |
-
- For
|
| 211 |
|
| 212 |
```bash
|
| 213 |
sglang serve --model-path stepfun-ai/Step-3.7-Flash \
|
|
@@ -225,7 +225,7 @@ sglang serve --model-path stepfun-ai/Step-3.7-Flash \
|
|
| 225 |
--port 8000
|
| 226 |
```
|
| 227 |
|
| 228 |
-
- For
|
| 229 |
|
| 230 |
```bash
|
| 231 |
sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
|
|
@@ -244,6 +244,20 @@ sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
|
|
| 244 |
--port 8000
|
| 245 |
```
|
| 246 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
### 6.3 Transformers (Debug / Verification)
|
| 248 |
|
| 249 |
Use this snippet for quick functional verification. For high-throughput serving, use vLLM or SGLang.
|
|
@@ -377,7 +391,7 @@ cmake --build build-vulkan -j8
|
|
| 377 |
|
| 378 |
## 7. Using Step 3.7 Flash on Agent Platforms
|
| 379 |
|
| 380 |
-
You can use Step 3.7 Flash on Agent platforms such as Hermes Agent,
|
| 381 |
|
| 382 |
## 8. Getting in Touch
|
| 383 |
|
|
@@ -386,3 +400,7 @@ As we work to shape the future of AGI by expanding broad model capabilities, we
|
|
| 386 |
- **Join the Conversation:** Our [Discord](https://discord.gg/RcMJhNVAQc) community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
|
| 387 |
- **Report Friction:** Encountering limitations? You can open an issue or start a discussion on GitHub / HuggingFace, or flag it directly in our Discord support channels.
|
| 388 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
### Code Engineering and Professional Baselines
|
| 30 |
|
| 31 |
+
Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GPDVal-AA (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.
|
| 32 |
|
| 33 |

|
| 34 |
|
|
|
|
| 41 |
| Output | $1.15 / M tokens |
|
| 42 |
|
| 43 |
## 4. Availability, Deployment, and Ecosystem
|
| 44 |
+
- Availability: Step 3.7 Flash is available through StepFun Open Platform — [platform.stepfun.ai](https://platform.stepfun.ai) (Global) and [platform.stepfun.com](https://platform.stepfun.com) (China) — as well as partner platforms including OpenRouter and NVIDIA NIM. StepFun is also partnering with DeepInfra, Fireworks AI, and Modal Labs to expand availability soon.
|
| 45 |
- Deployment: Step 3.7 Flash supports flexible deployment across cloud, data center, and local environments. For large-scale production and enterprise use cases, Step 3.7 Flash can be deployed on modern data center infrastructure. For local and workstation scenarios, it can also run on high-memory devices such as NVIDIA DGX Station, AMD Ryzen AI Max+ 395-based systems, and Mac Studio / Macbook Pro devices with at least 128GB unified memory.
|
| 46 |
- Ecosystem: Step 3.7 Flash is supported across popular open-source infrastructure for both inference and model development. For inference and serving, developers can use vLLM, SGLang, Hugging Face Transformers, and llama.cpp. For model development workflows, StepFun model support has landed in the NVIDIA Megatron ecosystem, including Megatron Core and Megatron Bridge.
|
| 47 |
|
|
|
|
| 143 |
|
| 144 |
2. Launch the server.
|
| 145 |
|
| 146 |
+
- For FP8 model
|
| 147 |
```bash
|
| 148 |
vllm serve <MODEL_PATH_OR_HF_ID> \
|
| 149 |
+
--served-model-name step3p7-flash \
|
| 150 |
--tensor-parallel-size 8 \
|
| 151 |
--enable-expert-parallel \
|
| 152 |
--disable-cascade-attn \
|
|
|
|
| 156 |
--speculative_config '{"method": "mtp", "num_speculative_tokens": 3}' \
|
| 157 |
--trust-remote-code
|
| 158 |
```
|
| 159 |
+
- For BF16 model
|
| 160 |
```bash
|
| 161 |
vllm serve <MODEL_PATH_OR_HF_ID> \
|
| 162 |
+
--served-model-name step3p7-flash-bf16 \
|
| 163 |
--tensor-parallel-size 8 \
|
| 164 |
--enable-expert-parallel \
|
| 165 |
--disable-cascade-attn \
|
|
|
|
| 170 |
--trust-remote-code
|
| 171 |
```
|
| 172 |
|
| 173 |
+
- For NVFP4 model
|
| 174 |
Compared to standard precisions, running the FP4 quantized version requires modelopt activation and FP8 KV Cache alignment.
|
| 175 |
```bash
|
| 176 |
python3 -m vllm.entrypoints.openai.api_server \
|
|
|
|
| 207 |
|
| 208 |
> **Note:** For Blackwell GPUs, `--mm-attention-backend fa4` may be used.
|
| 209 |
|
| 210 |
+
- For BF16 model
|
| 211 |
|
| 212 |
```bash
|
| 213 |
sglang serve --model-path stepfun-ai/Step-3.7-Flash \
|
|
|
|
| 225 |
--port 8000
|
| 226 |
```
|
| 227 |
|
| 228 |
+
- For FP8 model
|
| 229 |
|
| 230 |
```bash
|
| 231 |
sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
|
|
|
|
| 244 |
--port 8000
|
| 245 |
```
|
| 246 |
|
| 247 |
+
- For NVFP4 model
|
| 248 |
+
|
| 249 |
+
```bash
|
| 250 |
+
sglang serve --model-path stepfun-ai/Step-3.7-Flash-NVFP4 \
|
| 251 |
+
--tp 4 --ep 4 \
|
| 252 |
+
--moe-runner-backend flashinfer_trtllm \
|
| 253 |
+
--kv-cache-dtype fp8_e4m3 \
|
| 254 |
+
--quantization modelopt_fp4 \
|
| 255 |
+
--trust-remote-code \
|
| 256 |
+
--reasoning-parser step3p5 \
|
| 257 |
+
--tool-call-parser step3p5 \
|
| 258 |
+
--attention-backend trtllm_mha
|
| 259 |
+
```
|
| 260 |
+
|
| 261 |
### 6.3 Transformers (Debug / Verification)
|
| 262 |
|
| 263 |
Use this snippet for quick functional verification. For high-throughput serving, use vLLM or SGLang.
|
|
|
|
| 391 |
|
| 392 |
## 7. Using Step 3.7 Flash on Agent Platforms
|
| 393 |
|
| 394 |
+
You can use Step 3.7 Flash on Agent platforms such as Hermes Agent, OpenClaw, Kilo Code, and more.
|
| 395 |
|
| 396 |
## 8. Getting in Touch
|
| 397 |
|
|
|
|
| 400 |
- **Join the Conversation:** Our [Discord](https://discord.gg/RcMJhNVAQc) community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
|
| 401 |
- **Report Friction:** Encountering limitations? You can open an issue or start a discussion on GitHub / HuggingFace, or flag it directly in our Discord support channels.
|
| 402 |
|
| 403 |
+
## 📄 License
|
| 404 |
+
|
| 405 |
+
This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|
| 406 |
+
|