Instructions to use stepfun-ai/Step-3.7-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stepfun-ai/Step-3.7-Flash with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="stepfun-ai/Step-3.7-Flash", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("stepfun-ai/Step-3.7-Flash", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use stepfun-ai/Step-3.7-Flash with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stepfun-ai/Step-3.7-Flash"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.7-Flash",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/stepfun-ai/Step-3.7-Flash

SGLang

How to use stepfun-ai/Step-3.7-Flash with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "stepfun-ai/Step-3.7-Flash" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.7-Flash",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "stepfun-ai/Step-3.7-Flash" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.7-Flash",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use stepfun-ai/Step-3.7-Flash with Docker Model Runner:
```
docker model run hf.co/stepfun-ai/Step-3.7-Flash
```

hengm3467 commited on 1 day ago

Commit

fbcee77

1 Parent(s): 8817d4a

sync to v2 model card

Browse files

- benchmark name GPDVal -> GPDVal-AA
- Availability: add DeepInfra, Fireworks AI, Modal Labs as upcoming partners
- vLLM: swap naming so FP8 is the default (step3p7-flash) and BF16 is suffixed
- SGLang: add NVFP4 launch command
- capitalize FP8 / BF16 / NVFP4 labels
- Agent Platforms: drop Lemonade
- add License section

Files changed (1) hide show

README.md +28 -10

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ Execution reliability is critical for autonomous agents. Step 3.7 Flash leads th
 ### Code Engineering and Professional Baselines
-Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GPDVal (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.
 ![Step 3.7 Flash benchmark results across General Agent, Agentic Coding, and Multimodal evaluations](assets/benchmarks.png)
@@ -41,7 +41,7 @@ Step 3.7 Flash is built for live engineering tasks and secured a definitive seco
 | Output | $1.15 / M tokens |
 ## 4. Availability, Deployment, and Ecosystem
-- Availability: Step 3.7 Flash is available through StepFun Open Platform — [platform.stepfun.ai](https://platform.stepfun.ai) (Global) and [platform.stepfun.com](https://platform.stepfun.com) (China) — as well as partner platforms including OpenRouter and NVIDIA NIM.
 - Deployment: Step 3.7 Flash supports flexible deployment across cloud, data center, and local environments. For large-scale production and enterprise use cases, Step 3.7 Flash can be deployed on modern data center infrastructure. For local and workstation scenarios, it can also run on high-memory devices such as NVIDIA DGX Station, AMD Ryzen AI Max+ 395-based systems, and Mac Studio / Macbook Pro devices with at least 128GB unified memory.
 - Ecosystem: Step 3.7 Flash is supported across popular open-source infrastructure for both inference and model development. For inference and serving, developers can use vLLM, SGLang, Hugging Face Transformers, and llama.cpp. For model development workflows, StepFun model support has landed in the NVIDIA Megatron ecosystem, including Megatron Core and Megatron Bridge.
@@ -143,10 +143,10 @@ pip install -U vllm --pre \
 2. Launch the server.
-  - For fp8 model
   ```bash
   vllm serve <MODEL_PATH_OR_HF_ID> \
-  --served-model-name step3p7-flash-fp8 \
   --tensor-parallel-size 8 \
   --enable-expert-parallel \
   --disable-cascade-attn \
@@ -156,10 +156,10 @@ pip install -U vllm --pre \
   --speculative_config '{"method": "mtp", "num_speculative_tokens": 3}' \
   --trust-remote-code
   ```
-  - For bf16 model
   ```bash
   vllm serve <MODEL_PATH_OR_HF_ID> \
-  --served-model-name step3p7-flash \
   --tensor-parallel-size 8 \
   --enable-expert-parallel \
   --disable-cascade-attn \
@@ -170,7 +170,7 @@ pip install -U vllm --pre \
   --trust-remote-code
   ```
-  - For nvfp4 model
   Compared to standard precisions, running the FP4 quantized version requires modelopt activation and FP8 KV Cache alignment.
   ```bash
   python3 -m vllm.entrypoints.openai.api_server \
@@ -207,7 +207,7 @@ pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git"
 > **Note:** For Blackwell GPUs, `--mm-attention-backend fa4` may be used.
-- For bf16 model
 ```bash
 sglang serve --model-path stepfun-ai/Step-3.7-Flash \
@@ -225,7 +225,7 @@ sglang serve --model-path stepfun-ai/Step-3.7-Flash \
   --port 8000
 ```
-- For fp8 model
 ```bash
 sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
@@ -244,6 +244,20 @@ sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
   --port 8000
 ```
 ### 6.3 Transformers (Debug / Verification)
 Use this snippet for quick functional verification. For high-throughput serving, use vLLM or SGLang.
@@ -377,7 +391,7 @@ cmake --build build-vulkan -j8
 ## 7. Using Step 3.7 Flash on Agent Platforms
-You can use Step 3.7 Flash on Agent platforms such as Hermes Agent, Lemonade, OpenClaw, Kilo Code, and more.
 ## 8. Getting in Touch
@@ -386,3 +400,7 @@ As we work to shape the future of AGI by expanding broad model capabilities, we
 - **Join the Conversation:** Our [Discord](https://discord.gg/RcMJhNVAQc) community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
 - **Report Friction:** Encountering limitations? You can open an issue or start a discussion on GitHub / HuggingFace, or flag it directly in our Discord support channels.

 ### Code Engineering and Professional Baselines
+Step 3.7 Flash is built for live engineering tasks and secured a definitive second-place finish on SWE-Bench PRO with a score of 56.3. It can independently trace multi-file repositories, isolate bugs from raw issue reports, and generate functional patches that pass automated unit tests. While evaluations like Terminal-Bench 2.1 (59.5) and GPDVal-AA (45.8) show clear areas for future optimization compared to the absolute peak of the cohort, they establish a dependable baseline for system interactions and structured professional deliverables.
 ![Step 3.7 Flash benchmark results across General Agent, Agentic Coding, and Multimodal evaluations](assets/benchmarks.png)
 | Output | $1.15 / M tokens |
 ## 4. Availability, Deployment, and Ecosystem
+- Availability: Step 3.7 Flash is available through StepFun Open Platform — [platform.stepfun.ai](https://platform.stepfun.ai) (Global) and [platform.stepfun.com](https://platform.stepfun.com) (China) — as well as partner platforms including OpenRouter and NVIDIA NIM. StepFun is also partnering with DeepInfra, Fireworks AI, and Modal Labs to expand availability soon.
 - Deployment: Step 3.7 Flash supports flexible deployment across cloud, data center, and local environments. For large-scale production and enterprise use cases, Step 3.7 Flash can be deployed on modern data center infrastructure. For local and workstation scenarios, it can also run on high-memory devices such as NVIDIA DGX Station, AMD Ryzen AI Max+ 395-based systems, and Mac Studio / Macbook Pro devices with at least 128GB unified memory.
 - Ecosystem: Step 3.7 Flash is supported across popular open-source infrastructure for both inference and model development. For inference and serving, developers can use vLLM, SGLang, Hugging Face Transformers, and llama.cpp. For model development workflows, StepFun model support has landed in the NVIDIA Megatron ecosystem, including Megatron Core and Megatron Bridge.
 2. Launch the server.
+  - For FP8 model
   ```bash
   vllm serve <MODEL_PATH_OR_HF_ID> \
+  --served-model-name step3p7-flash \
   --tensor-parallel-size 8 \
   --enable-expert-parallel \
   --disable-cascade-attn \
   --speculative_config '{"method": "mtp", "num_speculative_tokens": 3}' \
   --trust-remote-code
   ```
+  - For BF16 model
   ```bash
   vllm serve <MODEL_PATH_OR_HF_ID> \
+  --served-model-name step3p7-flash-bf16 \
   --tensor-parallel-size 8 \
   --enable-expert-parallel \
   --disable-cascade-attn \
   --trust-remote-code
   ```
+  - For NVFP4 model
   Compared to standard precisions, running the FP4 quantized version requires modelopt activation and FP8 KV Cache alignment.
   ```bash
   python3 -m vllm.entrypoints.openai.api_server \
 > **Note:** For Blackwell GPUs, `--mm-attention-backend fa4` may be used.
+- For BF16 model
 ```bash
 sglang serve --model-path stepfun-ai/Step-3.7-Flash \
   --port 8000
 ```
+- For FP8 model
 ```bash
 sglang serve --model-path stepfun-ai/Step-3.7-Flash-fp8 \
   --port 8000
 ```
+- For NVFP4 model
+```bash
+sglang serve --model-path stepfun-ai/Step-3.7-Flash-NVFP4 \
+  --tp 4 --ep 4 \
+  --moe-runner-backend flashinfer_trtllm \
+  --kv-cache-dtype fp8_e4m3 \
+  --quantization modelopt_fp4 \
+  --trust-remote-code \
+  --reasoning-parser step3p5 \
+  --tool-call-parser step3p5 \
+  --attention-backend trtllm_mha
+```
 ### 6.3 Transformers (Debug / Verification)
 Use this snippet for quick functional verification. For high-throughput serving, use vLLM or SGLang.
 ## 7. Using Step 3.7 Flash on Agent Platforms
+You can use Step 3.7 Flash on Agent platforms such as Hermes Agent, OpenClaw, Kilo Code, and more.
 ## 8. Getting in Touch
 - **Join the Conversation:** Our [Discord](https://discord.gg/RcMJhNVAQc) community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
 - **Report Friction:** Encountering limitations? You can open an issue or start a discussion on GitHub / HuggingFace, or flag it directly in our Discord support channels.
+## 📄 License
+This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).