Instructions to use stepfun-ai/Step-3.7-Flash-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use stepfun-ai/Step-3.7-Flash-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="stepfun-ai/Step-3.7-Flash-GGUF", filename="BF16/Step3.7-flash-bf16-00001-of-00009.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use stepfun-ai/Step-3.7-Flash-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S # Run inference directly in the terminal: llama-cli -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S # Run inference directly in the terminal: llama-cli -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S # Run inference directly in the terminal: ./llama-cli -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S # Run inference directly in the terminal: ./build/bin/llama-cli -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
Use Docker
docker model run hf.co/stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
- LM Studio
- Jan
- vLLM
How to use stepfun-ai/Step-3.7-Flash-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "stepfun-ai/Step-3.7-Flash-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stepfun-ai/Step-3.7-Flash-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
- Ollama
How to use stepfun-ai/Step-3.7-Flash-GGUF with Ollama:
ollama run hf.co/stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
- Unsloth Studio new
How to use stepfun-ai/Step-3.7-Flash-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for stepfun-ai/Step-3.7-Flash-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for stepfun-ai/Step-3.7-Flash-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for stepfun-ai/Step-3.7-Flash-GGUF to start chatting
- Pi new
How to use stepfun-ai/Step-3.7-Flash-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use stepfun-ai/Step-3.7-Flash-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
Run Hermes
hermes
- Docker Model Runner
How to use stepfun-ai/Step-3.7-Flash-GGUF with Docker Model Runner:
docker model run hf.co/stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
- Lemonade
How to use stepfun-ai/Step-3.7-Flash-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull stepfun-ai/Step-3.7-Flash-GGUF:Q4_K_S
Run and chat with the model
lemonade run user.Step-3.7-Flash-GGUF-Q4_K_S
List all available models
lemonade list
[ModelPage]: https://static.stepfun.com/blog/step-3.7-flash/
1. Introduction
GGUF quantizations of stepfun-ai/Step-3.7-Flash.
Step-3.7-Flash is a 198B-parameter sparse Mixture-of-Experts vision-language model from StepFun-ai, activating ~11B parameters per token for up to 400 t/s throughput. It pairs a 196B-parameter language backbone with a 1.8B-parameter vision encoder for native image understanding, supports a 256K context window, and offers three selectable reasoning levels (low / medium / high) to balance speed, cost, and depth. Built for agentic workloads — tool calling, multi-step reasoning, code, and math — with native multilingual coverage.
A separate mmproj projector ships alongside the language quants for multimodal inference. With 128 GB of unified memory (Mac Studio, DGX Spark, Ryzen AI Max+ 395, etc.), you can privately host Step-3.7-Flash: Q4 quants and below run at full 256K context with high precision.
2. Files
| File | Quant | Size | Notes |
|---|---|---|---|
Step-3.7-flash-BF16.gguf |
BF16 | 394 GB | Full-precision reference. |
Step-3.7-flash-Q8_0.gguf |
Q8_0 | 209 GB | Near-lossless. Does not use imatrix. |
Step-3.7-flash-Q4_K_S.gguf |
Q4_K_S | 112 GB | imatrix-calibrated. Balanced quality / size. |
Step-3.7-flash-IQ4_XS.gguf |
IQ4_XS | 105 GB | imatrix-calibrated. Slightly smaller than Q4_K_S, comparable quality. |
Step-3.7-flash-Q3_K_L.gguf |
Q3_K_L | 103 GB | imatrix-calibrated. Aggressive size reduction. |
Step-3.7-flash-Q3_K_M.gguf |
Q3_K_M | 94 GB | imatrix-calibrated. Use when you need to fit on a single 64-96 GB device; expect modest quality loss at low bit-widths. |
mmproj-Step-3.7-flash-f16.gguf |
F16 | 4 GB | Vision projector. Pair with any of the language quants above for image input. |
chat_template.jinja |
— | <1 KB | Reasoning is always on; pass reasoning_effort (low / medium / high) to control reasoning depth. |
chat_template_nothink.jinja |
— | <1 KB | Adds an enable_thinking switch: pass enable_thinking=false to skip reasoning. |
3. Quickstart
Build llama.cpp and run:
# 1. Clone and build
git clone https://github.com/stepfun-ai/llama.cpp.git
cd llama.cpp
git checkout -b step3.7 origin/step3.7
cmake -B build -DLLAMA_BUILD_TOOLS=ON -DLLAMA_BUILD_SERVER=ON
cmake --build build --config Release -j$(nproc)
# 2. Test performance (benchmark)
./build/bin/llama-batched-bench \
-m Step-3.7-flash-Q4_K_S.gguf \
-c 32768 -b 2048 -ub 2048 \
-npp 0,2048,8192,16384,32768 -ntg 128 -npl 1
# 3. Text-only inference
./build/bin/llama-cli \
-m Step-3.7-flash-Q4_K_S.gguf \
-c 32768 -ngl 99 -fa on \
-p "Write a Python function to compute the n-th Fibonacci number."
# 4. With vision (image + text)
./build/bin/llama-mtmd-cli \
-m Step-3.7-flash-Q4_K_S.gguf \
--mmproj mmproj-Step-3.7-flash-f16.gguf \
-c 32768 -ngl 99 -fa on \
--image path/to/image.jpg \
-p "Describe this image."
# 5. OpenAI-compatible server (text + vision)
./build/bin/llama-server \
-m Step-3.7-flash-Q4_K_S.gguf \
--mmproj mmproj-Step-3.7-flash-f16.gguf \
-c 32768 -ngl 99 -fa on \
--host 0.0.0.0 --port 8080
For full CLI / server options, see the llama.cpp README.
4. Performance
Apple Mac Studio (M4 max, 128 GB unified memory)
Step-3.7-flash-Q4_K_S
./llama-batched-bench -m Step-3.7-flash-Q4_K_S.gguf -c 262150 -b 2048 -ub 1024 -npp 0,2048,8192,16384,32768,65536,131072,262144 -ntg 128 -npl 1
| PP | TG | PL | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 2.500 | 51.20 | 2.500 | 51.20 |
| 2048 | 128 | 1 | 2176 | 4.873 | 420.28 | 2.639 | 48.51 | 7.512 | 289.68 |
| 8192 | 128 | 1 | 8320 | 20.292 | 403.70 | 2.757 | 46.43 | 23.049 | 360.97 |
| 16384 | 128 | 1 | 16512 | 42.854 | 382.32 | 2.924 | 43.77 | 45.779 | 360.69 |
| 32768 | 128 | 1 | 32896 | 95.168 | 344.32 | 3.223 | 39.72 | 98.391 | 334.34 |
| 65536 | 128 | 1 | 65664 | 233.885 | 280.21 | 3.909 | 32.74 | 237.794 | 276.14 |
| 131072 | 128 | 1 | 131200 | 635.499 | 206.25 | 5.759 | 22.23 | 641.258 | 204.60 |
| 262144 | 128 | 1 | 262272 | 2362.488 | 110.96 | 13.188 | 9.71 | 2375.677 | 110.40 |
Step-3.7-flash-IQ4_XS
./llama-batched-bench -m Step-3.7-flash-IQ4_XS.gguf -c 262150 -b 2048 -ub 1024 -npp 0,2048,8192,16384,32768,65536,131072,262144 -ntg 128 -npl 1
| PP | TG | PL | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 2.582 | 49.58 | 2.582 | 49.58 |
| 2048 | 128 | 1 | 2176 | 4.835 | 423.56 | 2.679 | 47.78 | 7.514 | 289.60 |
| 8192 | 128 | 1 | 8320 | 19.954 | 410.55 | 2.803 | 45.66 | 22.757 | 365.60 |
| 16384 | 128 | 1 | 16512 | 42.142 | 388.78 | 2.957 | 43.29 | 45.098 | 366.13 |
| 32768 | 128 | 1 | 32896 | 93.489 | 350.50 | 3.288 | 38.93 | 96.777 | 339.91 |
| 65536 | 128 | 1 | 65664 | 227.088 | 288.59 | 3.945 | 32.44 | 231.033 | 284.22 |
| 131072 | 128 | 1 | 131200 | 635.047 | 206.40 | 5.791 | 22.10 | 640.838 | 204.73 |
| 262144 | 128 | 1 | 262272 | 2170.271 | 120.79 | 13.070 | 9.79 | 2183.342 | 120.12 |
Step-3.7-flash-Q3_K_L
./llama-batched-bench -m Step-3.7-flash-Q3_K_L.gguf -c 262272 -b 2048 -ub 1024 -npp 0,2048,8192,16384,32768,65536,131072,262144 -ntg 128 -npl 1
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 3.590 | 35.66 | 3.590 | 35.66 |
| 2048 | 128 | 1 | 2176 | 5.263 | 389.15 | 3.702 | 34.57 | 8.965 | 242.72 |
| 8192 | 128 | 1 | 8320 | 21.789 | 375.97 | 3.817 | 33.53 | 25.606 | 324.92 |
| 16384 | 128 | 1 | 16512 | 45.819 | 357.58 | 3.977 | 32.18 | 49.796 | 331.59 |
| 32768 | 128 | 1 | 32896 | 100.827 | 324.99 | 4.308 | 29.71 | 105.135 | 312.89 |
| 65536 | 128 | 1 | 65664 | 242.172 | 270.62 | 4.977 | 25.72 | 247.149 | 265.69 |
| 131072 | 128 | 1 | 131200 | 659.645 | 198.70 | 6.764 | 18.92 | 666.409 | 196.88 |
| 262144 | 128 | 1 | 262272 | 2200.370 | 119.14 | 14.008 | 9.14 | 2214.378 | 118.44 |
NVIDIA DGX Spark (GB10, 128 GB unified memory)
Step-3.7-flash-Q4_K_S
./llama-batched-bench -m Step-3.7-flash-Q4_K_S.gguf -c 131300 -b 2048 -ub 1024 -npp 0,2048,8192,16384,32768,65536,131072 -ntg 128 -npl 1
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 5.157 | 24.82 | 5.157 | 24.82 |
| 2048 | 128 | 1 | 2176 | 8.021 | 255.33 | 4.907 | 26.08 | 12.929 | 168.31 |
| 8192 | 128 | 1 | 8320 | 10.866 | 753.89 | 5.169 | 24.76 | 16.035 | 518.86 |
| 16384 | 128 | 1 | 16512 | 29.389 | 557.49 | 6.215 | 20.60 | 35.603 | 463.78 |
| 32768 | 128 | 1 | 32896 | 52.501 | 624.14 | 6.931 | 18.47 | 59.432 | 553.50 |
| 65536 | 128 | 1 | 65664 | 112.321 | 583.47 | 7.769 | 16.48 | 120.090 | 546.79 |
| 131072 | 128 | 1 | 131200 | 281.479 | 465.66 | 9.834 | 13.02 | 291.313 | 450.37 |
Step-3.7-flash-IQ4_XS
./llama-batched-bench -m Step-3.7-flash-IQ4_XS.gguf -c 262272 -b 2048 -ub 1024 -npp 0,2048,8192,16384,32768,65536,131072,262144 -ntg 128 -npl 1
| PP | TG | PL | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 5.368 | 23.85 | 5.368 | 23.85 |
| 2048 | 128 | 1 | 2176 | 4.250 | 481.87 | 5.311 | 24.10 | 9.561 | 227.58 |
| 8192 | 128 | 1 | 8320 | 12.531 | 653.73 | 5.817 | 22.01 | 18.348 | 453.46 |
| 16384 | 128 | 1 | 16512 | 24.474 | 669.44 | 5.915 | 21.64 | 30.389 | 543.35 |
| 32768 | 128 | 1 | 32896 | 51.976 | 630.44 | 6.531 | 19.60 | 58.508 | 562.25 |
| 65536 | 128 | 1 | 65664 | 116.305 | 563.48 | 7.934 | 16.13 | 124.239 | 528.53 |
| 131072 | 128 | 1 | 131200 | 298.746 | 438.74 | 10.263 | 12.47 | 309.009 | 424.58 |
| 262144 | 128 | 1 | 262272 | 924.872 | 283.44 | 14.862 | 8.61 | 939.734 | 279.09 |
Step-3.7-flash-Q3_K_L
./llama-batched-bench -m Step-3.7-flash-Q3_K_L.gguf -c 262272 -b 2048 -ub 1024 -npp 0,2048,8192,16384,32768,65536,131072,262144 -ntg 128 -npl 1
| PP | TG | PL | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 5.947 | 21.52 | 5.947 | 21.52 |
| 2048 | 128 | 1 | 2176 | 4.145 | 494.08 | 5.623 | 22.76 | 9.768 | 222.77 |
| 8192 | 128 | 1 | 8320 | 14.889 | 550.20 | 5.799 | 22.07 | 20.688 | 402.17 |
| 16384 | 128 | 1 | 16512 | 29.374 | 557.78 | 6.140 | 20.85 | 35.513 | 464.95 |
| 32768 | 128 | 1 | 32896 | 54.957 | 596.25 | 6.744 | 18.98 | 61.702 | 533.15 |
| 65536 | 128 | 1 | 65664 | 129.827 | 504.79 | 8.347 | 15.33 | 138.174 | 475.23 |
| 131072 | 128 | 1 | 131200 | 315.402 | 415.57 | 10.780 | 11.87 | 326.182 | 402.23 |
| 262144 | 128 | 1 | 262272 | 910.215 | 288.00 | 15.568 | 8.22 | 925.783 | 283.30 |
AMD Ryzen AI Max+ 395 (Strix Halo, 128 GB unified memory)
Step-3.7-flash-Q4_K_S
llama-batched-bench.exe -m Step-3.7-flash-Q4_K_S.gguf -c 65664 -b 2048 -ub 1024 -npp 0,2048,8192,16384,32768,65536 -ntg 128 -npl 1
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 4.878 | 26.24 | 4.878 | 26.24 |
| 2048 | 128 | 1 | 2176 | 9.367 | 218.63 | 5.134 | 24.93 | 14.501 | 150.06 |
| 8192 | 128 | 1 | 8320 | 43.540 | 188.15 | 5.508 | 23.24 | 49.048 | 169.63 |
| 16384 | 128 | 1 | 16512 | 111.814 | 146.53 | 5.947 | 21.53 | 117.761 | 140.22 |
| 32768 | 128 | 1 | 32896 | 357.819 | 91.58 | 6.779 | 18.88 | 364.598 | 90.23 |
| 65536 | 128 | 1 | 65664 | 1342.501 | 48.82 | 8.495 | 15.07 | 1350.996 | 48.60 |
Step-3.7-flash-IQ4_XS
./llama-batched-bench -m Step-3.7-flash-IQ4_XS.gguf -c 65664 -b 2048 -ub 1024 -npp 0,2048,8192,16384,32768,65536 -ntg 128 -npl 1
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 5.931 | 21.58 | 5.931 | 21.58 |
| 2048 | 128 | 1 | 2176 | 8.143 | 251.50 | 6.194 | 20.67 | 14.337 | 151.78 |
| 8192 | 128 | 1 | 8320 | 39.899 | 205.32 | 6.521 | 19.63 | 46.420 | 179.23 |
| 16384 | 128 | 1 | 16512 | 105.098 | 155.89 | 6.891 | 18.57 | 111.989 | 147.44 |
| 32768 | 128 | 1 | 32896 | 338.645 | 96.76 | 7.793 | 16.42 | 346.439 | 94.95 |
| 65536 | 128 | 1 | 65664 | 1310.820 | 50.00 | 9.489 | 13.49 | 1320.309 | 49.73 |
Step-3.7-flash-Q3_K_L
./llama-batched-bench -m Step-3.7-flash-Q3_K_L.gguf -c 262272 -b 2048 -ub 1024 -ctk q8_0 -ctv q8_0 -npp 0,2048,8192,16384,32768,65536,131072,262144 -ntg 128 -npl 1
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 128 | 1 | 128 | 0.000 | 0.00 | 5.015 | 25.53 | 5.015 | 25.53 |
| 2048 | 128 | 1 | 2176 | 10.246 | 199.88 | 5.073 | 25.23 | 15.319 | 142.04 |
| 8192 | 128 | 1 | 8320 | 37.229 | 220.05 | 5.341 | 23.96 | 42.570 | 195.44 |
| 16384 | 128 | 1 | 16512 | 79.234 | 206.78 | 5.489 | 23.32 | 84.723 | 194.89 |
| 32768 | 128 | 1 | 32896 | 179.697 | 182.35 | 5.810 | 22.03 | 185.507 | 177.33 |
| 65536 | 128 | 1 | 65664 | 436.593 | 150.11 | 6.577 | 19.46 | 443.169 | 148.17 |
| 131072 | 128 | 1 | 131200 | 1262.377 | 103.83 | 9.124 | 14.03 | 1271.501 | 103.19 |
| 262144 | 128 | 1 | 262272 | 3487.921 | 75.16 | 11.391 | 11.24 | 3499.312 | 74.95 |
5. Acknowledgments
This release stands on the work of the following authors and communities:
- bartowski — for
calibration_datav5, the community-standard imatrix calibration anchor used by countless GGUF releases. Used for calibration purposes only; no license has been verified for this resource. - eaddario — for the
imatrix-calibrationdataset (MIT), providing multilingual / code / math splits that form the backbone of this release's domain balance - NousResearch — for
hermes-function-calling-v1(Apache-2.0), used for agent / tool-call calibration coverage - ggml-org / llama.cpp — for the entire quantization and inference toolchain (MIT)
6. License
The GGUF quantization files in this repository are derivative works of
stepfun-ai/Step-3.7-Flash
and are released under the same Apache 2.0 license.
| Component | License |
|---|---|
| Base model weights (stepfun-ai/Step-3.7-Flash) | Apache-2.0 |
| Calibration dataset (eaddario/imatrix-calibration) | MIT |
| Calibration dataset (NousResearch/hermes-function-calling-v1) | Apache-2.0 |
| Quantization toolchain (llama.cpp) | MIT |
All calibration datasets retain their original licenses and are used strictly for quantization calibration purposes only.
- Downloads last month
- 476
3-bit
4-bit
8-bit
16-bit
Model tree for stepfun-ai/Step-3.7-Flash-GGUF
Base model
stepfun-ai/Step-3.7-Flash