Instructions to use dawncr0w/Step-3.7-Flash-oQ5-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use dawncr0w/Step-3.7-Flash-oQ5-MLX with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("dawncr0w/Step-3.7-Flash-oQ5-MLX") config = load_config("dawncr0w/Step-3.7-Flash-oQ5-MLX") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use dawncr0w/Step-3.7-Flash-oQ5-MLX with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dawncr0w/Step-3.7-Flash-oQ5-MLX"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dawncr0w/Step-3.7-Flash-oQ5-MLX" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dawncr0w/Step-3.7-Flash-oQ5-MLX with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dawncr0w/Step-3.7-Flash-oQ5-MLX"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dawncr0w/Step-3.7-Flash-oQ5-MLX
Run Hermes
hermes
Step-3.7-Flash-oQ5-MLX
This repository contains an oMLX oQ5 mixed-precision MLX quantization of
stepfun-ai/Step-3.7-Flash.
Step-3.7-Flash is a sparse Mixture-of-Experts vision-language model from StepFun. The upstream model card describes it as a 198B-parameter model with a 196B language backbone, a 1.8B vision encoder, approximately 11B active parameters per token, and a 256K context window.
Quantization
| Field | Value |
|---|---|
| Method | oMLX oQ mixed-precision MLX |
| Quantization | oQ5 |
| Base model revision | 5f6244077ac62e04eec3f320501ff8c2b293373a |
| Model type | step3p7 / step3p5 text backbone |
| Group size | 64 |
| Quantization mode | affine |
| Base bits | 5 |
| Effective plan | 5.79 bpw |
| Output shards | 27 safetensors |
| Output size | 131.0 GiB |
| Non-quantized/scales dtype | bfloat16 |
| Vision weights | preserved |
| Native MTP weights | not present in upstream weights |
Notes
Vision weights are preserved from the upstream model.
Native MTP weights are not included in this artifact. The upstream text_config declares num_nextn_predict_layers=3, but the published safetensors index does not contain mtp.* weights, so oMLX keeps the quantized output self-consistent with Native MTP disabled.
This is an unofficial quantized derivative. It is not affiliated with, sponsored by, or endorsed by StepFun.
Validation
Artifact validation completed locally with the bundled oMLX runtime on macOS:
source model: stepfun-ai/Step-3.7-Flash
source revision: 5f6244077ac62e04eec3f320501ff8c2b293373a
quantization: oQ5
config.json: present
model.safetensors.index.json: present
safetensor shards: 27
vision tensors: present
mtp tensors: not present
Generation smoke testing is intentionally not claimed here because Step-3.7-Flash is a very large VLM/MoE checkpoint and runtime support depends on the local MLX/oMLX build and available unified memory.
Usage
Use an MLX/oMLX build that supports the packaged Step3p7 architecture and multimodal processor.
huggingface-cli download \
--local-dir Step-3.7-Flash-oQ5-MLX \
dawncr0w/Step-3.7-Flash-oQ5-MLX
For a text-only smoke test, adapt the command to your local MLX/oMLX runtime:
python -m mlx_lm generate \
--model /path/to/Step-3.7-Flash-oQ5-MLX \
--prompt "Hello" \
--max-tokens 32 \
--temp 0
For multimodal inference, use an oMLX/MLX-VLM build that supports Step3p7
image-text-to-text models and pass the model directory as the local checkpoint.
License And Notice
The base model is distributed under the Apache License 2.0. This quantized artifact follows the same license.
- Downloads last month
- -
5-bit
Model tree for dawncr0w/Step-3.7-Flash-oQ5-MLX
Base model
stepfun-ai/Step-3.7-Flash