Step-3.7-Flash-oQ5-MLX

This repository contains an oMLX oQ5 mixed-precision MLX quantization of stepfun-ai/Step-3.7-Flash.

Step-3.7-Flash is a sparse Mixture-of-Experts vision-language model from StepFun. The upstream model card describes it as a 198B-parameter model with a 196B language backbone, a 1.8B vision encoder, approximately 11B active parameters per token, and a 256K context window.

Quantization

Field	Value
Method	oMLX oQ mixed-precision MLX
Quantization	oQ5
Base model revision	`5f6244077ac62e04eec3f320501ff8c2b293373a`
Model type	step3p7 / step3p5 text backbone
Group size	64
Quantization mode	affine
Base bits	5
Effective plan	5.79 bpw
Output shards	27 safetensors
Output size	131.0 GiB
Non-quantized/scales dtype	bfloat16
Vision weights	preserved
Native MTP weights	not present in upstream weights

Notes

Vision weights are preserved from the upstream model.

Native MTP weights are not included in this artifact. The upstream text_config declares num_nextn_predict_layers=3, but the published safetensors index does not contain mtp.* weights, so oMLX keeps the quantized output self-consistent with Native MTP disabled.

This is an unofficial quantized derivative. It is not affiliated with, sponsored by, or endorsed by StepFun.

Validation

Artifact validation completed locally with the bundled oMLX runtime on macOS:

source model: stepfun-ai/Step-3.7-Flash
source revision: 5f6244077ac62e04eec3f320501ff8c2b293373a
quantization: oQ5
config.json: present
model.safetensors.index.json: present
safetensor shards: 27
vision tensors: present
mtp tensors: not present

Generation smoke testing is intentionally not claimed here because Step-3.7-Flash is a very large VLM/MoE checkpoint and runtime support depends on the local MLX/oMLX build and available unified memory.

Usage

Use an MLX/oMLX build that supports the packaged Step3p7 architecture and multimodal processor.

huggingface-cli download \
  --local-dir Step-3.7-Flash-oQ5-MLX \
  dawncr0w/Step-3.7-Flash-oQ5-MLX

For a text-only smoke test, adapt the command to your local MLX/oMLX runtime:

python -m mlx_lm generate \
  --model /path/to/Step-3.7-Flash-oQ5-MLX \
  --prompt "Hello" \
  --max-tokens 32 \
  --temp 0

For multimodal inference, use an oMLX/MLX-VLM build that supports Step3p7 image-text-to-text models and pass the model directory as the local checkpoint.

License And Notice

The base model is distributed under the Apache License 2.0. This quantized artifact follows the same license.

Downloads last month: -

Safetensors

Model size

39B params

Tensor type

BF16

U32

MLX

Hardware compatibility

5-bit

Model tree for dawncr0w/Step-3.7-Flash-oQ5-MLX

Base model

stepfun-ai/Step-3.7-Flash

Quantized

(30)

this model