Instructions to use stepfun-ai/Step-3.5-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stepfun-ai/Step-3.5-Flash with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="stepfun-ai/Step-3.5-Flash", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("stepfun-ai/Step-3.5-Flash", trust_remote_code=True, dtype="auto")

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use stepfun-ai/Step-3.5-Flash with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stepfun-ai/Step-3.5-Flash"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.5-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/stepfun-ai/Step-3.5-Flash

SGLang

How to use stepfun-ai/Step-3.5-Flash with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "stepfun-ai/Step-3.5-Flash" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.5-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "stepfun-ai/Step-3.5-Flash" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/Step-3.5-Flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use stepfun-ai/Step-3.5-Flash with Docker Model Runner:
```
docker model run hf.co/stepfun-ai/Step-3.5-Flash
```

update-bmk-numbers

#18

by mh3467 - opened Feb 6

base: refs/heads/main

←

from: refs/pr/18

Discussion Files changed

+27

-95

This PR is in draft mode

Files changed (8) hide show

.eval_results/gpqa_diamond.yaml +0 -9
.eval_results/hle.yaml +0 -10
.eval_results/mmlu_pro.yaml +0 -9
.eval_results/swe_bench_verified.yaml +0 -9
.eval_results/terminal_bench_2.yaml +0 -9
README.md +25 -46
config.json +0 -1
step-bar-chart.png +2 -2

.eval_results/gpqa_diamond.yaml DELETED Viewed

@@ -1,9 +0,0 @@
-- dataset:
-    id: Idavidrein/gpqa
-    task_id: diamond
-  value: 83.5
-  date: '2026-02-11'
-  source:
-    url: https://arxiv.org/abs/2602.10604
-    name: Step 3.5 Flash Paper
-    user: SaylorTwift

.eval_results/hle.yaml DELETED Viewed

@@ -1,10 +0,0 @@
-- dataset:
-    id: cais/hle
-    task_id: hle
-  value: 23.1
-  date: '2026-02-11'
-  source:
-    url: https://arxiv.org/abs/2602.10604
-    name: Step 3.5 Flash Paper
-    user: SaylorTwift
-  notes: "Text Only"

.eval_results/mmlu_pro.yaml DELETED Viewed

@@ -1,9 +0,0 @@
-- dataset:
-    id: TIGER-Lab/MMLU-Pro
-    task_id: mmlu_pro
-  value: 84.4
-  date: '2026-02-11'
-  source:
-    url: https://arxiv.org/abs/2602.10604
-    name: Step 3.5 Flash Paper
-    user: SaylorTwift

.eval_results/swe_bench_verified.yaml DELETED Viewed

@@ -1,9 +0,0 @@
-- dataset:
-    id: SWE-bench/SWE-bench_Verified
-    task_id: swe_bench_%_resolved
-  value: 74.4
-  date: '2026-02-11'
-  source:
-    url: https://arxiv.org/abs/2602.10604
-    name: Step 3.5 Flash Paper
-    user: SaylorTwift

.eval_results/terminal_bench_2.yaml DELETED Viewed

@@ -1,9 +0,0 @@
-- dataset:
-    id: harborframework/terminal-bench-2.0
-    task_id: terminalbench_2
-  value: 51.0
-  date: '2026-02-11'
-  source:
-    url: https://arxiv.org/abs/2602.10604
-    name: Step 3.5 Flash Paper
-    user: SaylorTwift

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ library_name: transformers
 [![ModelScope](https://img.shields.io/badge/ModelScope-StepFun/STEP3p5-preview)](https://modelscope.cn/models/stepfun-ai/Step-3.5-Flash)
 [![Discord](https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white)](https://discord.gg/RcMJhNVAQc)
 [![Webpage](https://img.shields.io/badge/Webpage-Blog-blue)](https://static.stepfun.com/blog/step-3.5-flash/)
-[![Paper](https://img.shields.io/badge/Arxiv-TechReport-red)](https://arxiv.org/abs/2602.10604)
 [![License](https://img.shields.io/badge/License-Apache%202.0-green)]()
 [![Chat with the model on OpenRouter](https://img.shields.io/badge/Chat%20with%20the%20model-OpenRouter-5B3DF5?logo=chatbot&logoColor=white)](https://openrouter.ai/chat?models=stepfun/step-3.5-flash:free)
 [![Chat with the model on HuggingfaceSpace](https://img.shields.io/badge/Chat%20with%20the%20model-HuggingfaceSpace-5B3DF5?logo=chatbot&logoColor=white)](https://huggingface.co/spaces/stepfun-ai/Step-3.5-Flash)
@@ -52,29 +52,29 @@ Performance of Step 3.5 Flash measured across **Reasoning**, **Coding**, and **A
 ### Detailed Benchmarks
 | Benchmark | Step 3.5 Flash | DeepSeek V3.2 | Kimi K2 Thinking / K2.5 | GLM-4.7 | MiniMax M2.1 | MiMo-V2 Flash |
-| --- | --- | --- | --- | --- | --- | --- |
 | # Activated Params | 11B | 37B | 32B | 32B | 10B | 15B |
 | # Total Params (MoE) | 196B | 671B | 1T | 355B | 230B | 309B |
-| Est. decoding cost @ 128K context, Hopper GPU** | **1.0x**<br>100 tok/s, MTP-3, EP8 | **6.0x**<br>33 tok/s, MTP-1, EP32 | **18.9x**<br>33 tok/s, no MTP, EP32 | **18.9x**<br>100 tok/s, MTP-3, EP8 | **3.9x**<br>100 tok/s, MTP-3, EP8 | **1.2x**<br>100 tok/s, MTP-3, EP8 |
-| | | | **Agent** | | | |
-| τ²-Bench | 88.2 | 80.3 (85.2*) | 74.3*/85.4* | 87.4 | 86.6* | 80.3 (84.1*) |
-| BrowseComp | 51.6 | 51.4 | 41.5* / 60.6 | 52.0 | 47.4 | 45.4 |
-| BrowseComp (w/ Context Manager) | 69.0 | 67.6 | 60.2/74.9 | 67.5 | 62.0 | 58.3 |
-| BrowseComp-ZH | 66.9 | 65.0 | 62.3 / 62.3* | 66.6 | 47.8* | 51.2* |
-| BrowseComp-ZH (w/ Context Manager) | 73.7 | — | —/— | — | — | — |
-| GAIA (no file) | 84.5 | 75.1* | 75.6*/75.9* | 61.9* | 64.3* | 78.2* |
-| xbench-DeepSearch (2025.05) | 83.7 | 78.0* | 76.0*/76.7* | 72.0* | 68.7* | 69.3* |
-| xbench-DeepSearch (2025.10) | 56.3 | 55.7* | —/40+ | 52.3* | 43.0* | 44.0* |
-| ResearchRubrics | 65.3 | 55.8* | 56.2*/59.5* | 62.0* | 60.2* | 54.3* |
-| | | | **Reasoning** | | | |
-| AIME 2025 | 97.3 | 93.1 | 94.5/96.1 | 95.7 | 83.0 | 94.1 (95.1*) |
-| HMMT 2025 (Feb.) | 98.4 | 92.5 | 89.4/95.4 | 97.1 | 71.0* | 84.4 (95.4*) |
-| HMMT 2025 (Nov.) | 94.0 | 90.2 | 89.2*/— | 93.5 | 74.3* | 91.0* |
-| IMOAnswerBench | 85.4 | 78.3 | 78.6/81.8 | 82.0 | 60.4* | 80.9* |
-| | | | **Coding** | | | |
-| LiveCodeBench-V6 | 86.4 | 83.3 | 83.1/85.0 | 84.9 | — | 80.6 (81.6*) |
-| SWE-bench Verified | 74.4 | 73.1 | 71.3/76.8 | 73.8 | 74.0 | 73.4 |
-| Terminal-Bench 2.0 | 51.0 | 46.4 | 35.7*/50.8 | 41.0 | 47.9 | 38.5 |
 **Notes**:
 1. "—" indicates the score is not publicly available or not tested.
@@ -82,10 +82,6 @@ Performance of Step 3.5 Flash measured across **Reasoning**, **Coding**, and **A
 3. **BrowseComp (with Context Manager)**: When the effective context length exceeds a predefined threshold, the agent resets the context and restarts the agent loop. By contrast, Kimi K2.5 and DeepSeek-V3.2 used a "discard-all" strategy.
 4. **Decoding Cost**: Estimates are based on a methodology similar to, but more accurate than, the approach described arxiv.org/abs/2507.19427
-### Recommended Inference Parameters
-1. For general chat domain, we suggest: `temperature=0.6, top_p=0.95`
-2. For reasoning / agent scenario, we recommend: `temperature=1.0, top_p=0.95`.
 ## 4. Architecture Details
 Step 3.5 Flash is built on a **Sparse Mixture-of-Experts (MoE)** transformer architecture, optimized for high throughput and low VRAM usage during inference.
@@ -309,11 +305,10 @@ print(output_text)
 - Minimum VRAM: 120 GB (e.g., Mac studio, DGX-Spark, AMD Ryzen AI Max+ 395)
 - Recommended: 128GB unified memory
 #### Steps
-1. Use official llama.cpp:
-> the folder `Step-3.5-Flash/tree/main/llama.cpp` is **obsolete**
 ```bash
-git clone https://github.com/ggml-org/llama.cpp
-cd llama.cpp
 ```
 2. Build llama.cpp on Mac:
 ```bash
@@ -561,21 +556,5 @@ As we work to shape the future of AGI by expanding broad model capabilities, we
 - **Join the Conversation**: Our Discord community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
 - **Report Friction**: Encountering limitations? You can open an issue on GitHub or flag it directly in our Discord support channels.
-## 📜 Citation
-If you find this project useful in your research, please cite our technical report:
-```tex
-@misc{huang2026step35flashopen,
-      title={Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters},
-      author={Ailin Huang and Ang Li and Aobo Kong and Bin Wang and Binxing Jiao and Bo Dong and Bojun Wang and Boyu Chen and Brian Li and Buyun Ma and Chang Su and Changxin Miao and Changyi Wan and Chao Lou and Chen Hu and Chen Xu and Chenfeng Yu and Chengting Feng and Chengyuan Yao and Chunrui Han and Dan Ma and Dapeng Shi and Daxin Jiang and Dehua Ma and Deshan Sun and Di Qi and Enle Liu and Fajie Zhang and Fanqi Wan and Guanzhe Huang and Gulin Yan and Guoliang Cao and Guopeng Li and Han Cheng and Hangyu Guo and Hanshan Zhang and Hao Nie and Haonan Jia and Haoran Lv and Hebin Zhou and Hekun Lv and Heng Wang and Heung-Yeung Shum and Hongbo Huang and Hongbo Peng and Hongyu Zhou and Hongyuan Wang and Houyong Chen and Huangxi Zhu and Huimin Wu and Huiyong Guo and Jia Wang and Jian Zhou and Jianjian Sun and Jiaoren Wu and Jiaran Zhang and Jiashu Lv and Jiashuo Liu and Jiayi Fu and Jiayu Liu and Jie Cheng and Jie Luo and Jie Yang and Jie Zhou and Jieyi Hou and Jing Bai and Jingcheng Hu and Jingjing Xie and Jingwei Wu and Jingyang Zhang and Jishi Zhou and Junfeng Liu and Junzhe Lin and Ka Man Lo and Kai Liang and Kaibo Liu and Kaijun Tan and Kaiwen Yan and Kaixiang Li and Kang An and Kangheng Lin and Lei Yang and Liang Lv and Liang Zhao and Liangyu Chen and Lieyu Shi and Liguo Tan and Lin Lin and Lina Chen and Luck Ma and Mengqiang Ren and Michael Li and Ming Li and Mingliang Li and Mingming Zhang and Mingrui Chen and Mitt Huang and Na Wang and Peng Liu and Qi Han and Qian Zhao and Qinglin He and Qinxin Du and Qiuping Wu and Quan Sun and Rongqiu Yang and Ruihang Miao and Ruixin Han and Ruosi Wan and Ruyan Guo and Shan Wang and Shaoliang Pang and Shaowen Yang and Shengjie Fan and Shijie Shang and Shiliang Yang and Shiwei Li and Shuangshuang Tian and Siqi Liu and Siye Wu and Siyu Chen and Song Yuan and Tiancheng Cao and Tianchi Yue and Tianhao Cheng and Tianning Li and Tingdan Luo and Wang You and Wei Ji and Wei Yuan and Wei Zhang and Weibo Wu and Weihao Xie and Wen Sun and Wenjin Deng and Wenzhen Zheng and Wuxun Xie and Xiangfeng Wang and Xiangwen Kong and Xiangyu Liu and Xiangyu Zhang and Xiaobo Yang and Xiaojia Liu and Xiaolan Yuan and Xiaoran Jiao and Xiaoxiao Ren and Xiaoyun Zhang and Xin Li and Xin Liu and Xin Wu and Xing Chen and Xingping Yang and Xinran Wang and Xu Zhao and Xuan He and Xuanti Feng and Xuedan Cai and Xuqiang Zhou and Yanbo Yu and Yang Li and Yang Xu and Yanlin Lai and Yanming Xu and Yaoyu Wang and Yeqing Shen and Yibo Zhu and Yichen Lv and Yicheng Cao and Yifeng Gong and Yijing Yang and Yikun Yang and Yin Zhao and Yingxiu Zhao and Yinmin Zhang and Yitong Zhang and Yixuan Zhang and Yiyang Chen and Yongchi Zhao and Yongshen Long and Yongyao Wang and Yousong Guan and Yu Zhou and Yuang Peng and Yuanhao Ding and Yuantao Fan and Yuanzhen Yang and Yuchu Luo and Yudi Zhao and Yue Peng and Yueqiang Lin and Yufan Lu and Yuling Zhao and Yunzhou Ju and Yurong Zhang and Yusheng Li and Yuxiang Yang and Yuyang Chen and Yuzhu Cai and Zejia Weng and Zetao Hong and Zexi Li and Zhe Xie and Zheng Ge and Zheng Gong and Zheng Zeng and Zhenyi Lu and Zhewei Huang and Zhichao Chang and Zhiguo Huang and Zhiheng Hu and Zidong Yang and Zili Wang and Ziqi Ren and Zixin Zhang and Zixuan Wang},
-      year={2026},
-      eprint={2602.10604},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2602.10604},
-}
-```
 ## License
 This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

 [![ModelScope](https://img.shields.io/badge/ModelScope-StepFun/STEP3p5-preview)](https://modelscope.cn/models/stepfun-ai/Step-3.5-Flash)
 [![Discord](https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white)](https://discord.gg/RcMJhNVAQc)
 [![Webpage](https://img.shields.io/badge/Webpage-Blog-blue)](https://static.stepfun.com/blog/step-3.5-flash/)
+[![Paper](https://img.shields.io/badge/Paper-Arxiv-red)](https://github.com/stepfun-ai/Step-3.5-Flash/blob/main/step_3p5_flash_tech_report.pdf)
 [![License](https://img.shields.io/badge/License-Apache%202.0-green)]()
 [![Chat with the model on OpenRouter](https://img.shields.io/badge/Chat%20with%20the%20model-OpenRouter-5B3DF5?logo=chatbot&logoColor=white)](https://openrouter.ai/chat?models=stepfun/step-3.5-flash:free)
 [![Chat with the model on HuggingfaceSpace](https://img.shields.io/badge/Chat%20with%20the%20model-HuggingfaceSpace-5B3DF5?logo=chatbot&logoColor=white)](https://huggingface.co/spaces/stepfun-ai/Step-3.5-Flash)
 ### Detailed Benchmarks
 | Benchmark | Step 3.5 Flash | DeepSeek V3.2 | Kimi K2 Thinking / K2.5 | GLM-4.7 | MiniMax M2.1 | MiMo-V2 Flash |
+|---|---|---|---|---|---|---|
 | # Activated Params | 11B | 37B | 32B | 32B | 10B | 15B |
 | # Total Params (MoE) | 196B | 671B | 1T | 355B | 230B | 309B |
+| Est. decoding cost (@ 128K context, Hopper GPU**) | **1.0x** (100 tok/s, MTP-3, EP8) | 6.0x (33 tok/s, MTP-1, EP32) | 18.9x (33 tok/s, no MTP, EP32) | 18.9x (100 tok/s, MTP-3, EP8) | 3.9x (100 tok/s, MTP-3, EP8) | 1.2x (100 tok/s, MTP-3, EP8) |
+| **Agency** | | | | | | |
+| τ²-Bench | **88.2** | 80.3 | 74.3* / — | 87.4 | 80.2* | 80.3 |
+| BrowseComp | 51.6 | 51.4 | 41.5* / **60.6** | 52.0 | 47.4 | 45.4 |
+| BrowseComp (w/ Context Manager) | 69.0 | 67.6 | 60.2 / **74.9** | 67.5 | 62.0 | 58.3 |
+| BrowseComp-ZH | **66.9** | 65.0 | 62.3 / 62.3* | 66.6 | 47.8* | 51.2* |
+| BrowseComp-ZH (w/ Context Manager) | **73.7** | — | — / — | — | — | — |
+| GAIA (no file) | **84.5** | 75.1* | 75.6* / 75.9* | 61.9* | 64.3* | 78.2* |
+| xbench-DeepSearch (2025.05) | **83.7** | 78.0* | 76.0* / 76.7* | 72.0* | 68.7* | 69.3* |
+| xbench-DeepSearch (2025.10) | **56.3** | 55.7* | — / 40+ | 52.3* | 43.0* | 44.0* |
+| ResearchRubrics | **65.3** | 55.8* | 56.2* / 59.5* | 62.0* | 60.2* | 54.3* |
+| **Reasoning** | | | | | | |
+| AIME 2025 | **97.3** | 93.1 | 94.5 / 96.1 | 95.7 | 83.0 | 94.1 (95.1*) |
+| HMMT 2025 (Feb.) | **98.4** | 92.5 | 89.4 / 95.4 | 97.1 | 71.0* | 84.4 (95.4*) |
+| HMMT 2025 (Nov.) | **94.0** | 90.2 | 89.2* / — | 93.5 | 74.3* | 91.0* |
+| IMOAnswerBench | **85.4** | 78.3 | 78.6 / 81.8 | 82.0 | 60.4* | 80.9* |
+| **Coding** | | | | | | |
+| LiveCodeBench-V6 | **86.4** | 83.3 | 83.1 / 85.0 | 84.9 | — | 80.6 (81.6*) |
+| SWE-bench Verified | 74.4 | 73.1 | 71.3 / **76.8** | 73.8 | 74.0 | 73.4 |
+| Terminal-Bench 2.0 | **51.0** | 46.4 | 35.7* / 50.8 | 41.0 | 47.9 | 38.5 |
 **Notes**:
 1. "—" indicates the score is not publicly available or not tested.
 3. **BrowseComp (with Context Manager)**: When the effective context length exceeds a predefined threshold, the agent resets the context and restarts the agent loop. By contrast, Kimi K2.5 and DeepSeek-V3.2 used a "discard-all" strategy.
 4. **Decoding Cost**: Estimates are based on a methodology similar to, but more accurate than, the approach described arxiv.org/abs/2507.19427
 ## 4. Architecture Details
 Step 3.5 Flash is built on a **Sparse Mixture-of-Experts (MoE)** transformer architecture, optimized for high throughput and low VRAM usage during inference.
 - Minimum VRAM: 120 GB (e.g., Mac studio, DGX-Spark, AMD Ryzen AI Max+ 395)
 - Recommended: 128GB unified memory
 #### Steps
+1. Use llama.cpp:
 ```bash
+git clone git@github.com:stepfun-ai/Step-3.5-Flash.git
+cd Step-3.5-Flash/llama.cpp
 ```
 2. Build llama.cpp on Mac:
 ```bash
 - **Join the Conversation**: Our Discord community is the primary hub for brainstorming future architectures, proposing capabilities, and getting early access updates 🚀
 - **Report Friction**: Encountering limitations? You can open an issue on GitHub or flag it directly in our Discord support channels.
 ## License
 This project is open-sourced under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

config.json CHANGED Viewed

@@ -37,7 +37,6 @@
   "moe_router_activation": "sigmoid",
   "moe_router_scaling_factor": 3.0,
   "att_impl_type": "GQA",
-  "tie_word_embeddings": false,
   "rope_theta": [
     5000000.0,
     10000.0,

   "moe_router_activation": "sigmoid",
   "moe_router_scaling_factor": 3.0,
   "att_impl_type": "GQA",
   "rope_theta": [
     5000000.0,
     10000.0,

step-bar-chart.png CHANGED Viewed

Git LFS Details

SHA256: 3fa283dc9c139edc3331aaafa21d69de212a241f03262f09acf96fbc0123a93d
Pointer size: 131 Bytes
Size of remote file: 647 kB

Git LFS Details

SHA256: b353d54e27baaac2539402d9dacdccf8230ff909c098c31dc905fbc5a442165e
Pointer size: 131 Bytes
Size of remote file: 575 kB