Instructions to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx") config = load_config("osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx
Run Hermes
hermes
Configuration Parsing Warning:In config.json: "quantization_config.bits" must be a number
Step-3.7-Flash-OptiQ-3.7bpw-mlx
osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx is an Apple-Silicon MLX affine mixed-precision OptiQ tensor-format quantization of stepfun-ai/Step-3.7-Flash.
No fine-tuning, distillation, or retraining was applied. The upstream StepFun checkpoint was downloaded and verified locally. OptiQ stream/Frobenius sensitivity was used to allocate mixed bit widths at a 3.7 BPW target, then eligible text and vision .weight tensors were converted with MLX affine quantization. Tokenizer, chat template, custom Step3.7 Python modules, and non-quantized control tensors are preserved from the source release.
Compatibility Status
This upload is a standard MLX affine safetensors bundle, but it is not yet a drop-in mlx_lm.load(...) or mlx_vlm.load(...) model.
At conversion time, vanilla mlx-lm 0.31.3 and mlx-vlm 0.5.0 did not register model_type: step3p7. This repository is therefore intended for MLX runtime authors, loader implementers, and researchers who want a verified Step-3.7-Flash OptiQ tensor bundle. Native inference will require Step3p7 model-class support in MLX/MLX-LM/MLX-VLM or a compatible custom loader.
Model Details
| Property | Value |
|---|---|
| Base model | stepfun-ai/Step-3.7-Flash |
| Architecture | Step3p7 sparse MoE vision-language model |
| Parameters | 198B total, about 11B active per token |
| Context length | 256k |
| Vision encoder | 1.8B perception encoder, preserved in 2 vision shards |
| Local profile | MLX-OptiQ-Affine-3.7bpw |
| Bundle size | About 99 GB |
| Shards | 24 text safetensors + 2 vision safetensors |
| Source license | Apache-2.0 |
| Validation | Safetensors index validation, config metadata validation, manifest validation, MLX tensor sample loads |
Quantization Recipe
| Tensor class | Codec | Bits / handling |
|---|---|---|
Eligible text and vision .weight tensors |
MLX affine | OptiQ-assigned 3, 4, or 8 bits, group size 64 |
| Quantized tensor layout | MLX triplet | .weight, .scales, .biases |
| Norms, biases, routing/control tensors, and incompatible tensors | passthrough | source precision preserved |
OptiQ allocation summary:
| Metric | Value |
|---|---|
| Target BPW | 3.7 |
| Achieved BPW | 3.6930459517285583 |
| Allocation method | optiq_stream_frobenius |
| Candidate bits | 2, 3, 4, 8 |
| Quantized weights | 702 |
| Passthrough tensors | 769 |
| Group size | 64 |
| 3-bit allocations | 66 |
| 4-bit allocations | 62 |
| 8-bit allocations | 574 |
| Missing allocations | 0 |
The achieved BPW is the OptiQ allocation target over quantized weights. The on-disk bundle also includes MLX affine scale/bias tensors, passthrough tensors, tokenizer/config/custom code, and index metadata.
Files
model-00001.safetensorstomodel-00024.safetensors: text/model shards in MLX affine mixed-precision tensor format.model-vit-00001.safetensorsandmodel-vit-00002.safetensors: vision encoder shards in MLX affine mixed-precision tensor format.model.safetensors.index.json: rewritten safetensors index for quantized triplet tensors.optiq_allocation.json: OptiQ per-layer bit allocation.mlx_quantization_manifest.json: conversion manifest with quantized/passthrough tensor counts and tensor-level metadata.config.json: upstream config with added MLX OptiQ quantization metadata.configuration_step3p7.py,modeling_step3p7.py,processing_step3.py,vision_encoder.py: upstream custom Step3.7 code.tokenizer.json,tokenizer_config.json,special_tokens_map.json,chat_template.jinja: upstream tokenizer and prompt assets.
Tensor Inspection
Until Step3p7 support lands in an MLX runtime, use MLX tensor loading for inspection or custom loader development:
import mlx.core as mx
tensors = mx.load("model-00002.safetensors")
prefix = "model.layers.3.moe.down_proj"
print(tensors[prefix + ".weight"].shape, tensors[prefix + ".weight"].dtype)
print(tensors[prefix + ".scales"].shape, tensors[prefix + ".scales"].dtype)
print(tensors[prefix + ".biases"].shape, tensors[prefix + ".biases"].dtype)
Representative local validation for that 3-bit tensor returned:
| Tensor | Shape | Dtype |
|---|---|---|
model.layers.3.moe.down_proj.weight |
(288, 4096, 120) |
uint32 |
model.layers.3.moe.down_proj.scales |
(288, 4096, 20) |
bfloat16 |
model.layers.3.moe.down_proj.biases |
(288, 4096, 20) |
bfloat16 |
Limitations
- This is a tensor-format MLX affine mixed-precision conversion, not a complete native Step3p7 MLX inference implementation.
- Current vanilla
mlx-lmandmlx-vlmreleases need Step3p7 architecture support before this can be used as a normal one-line load/generate model. - The OptiQ allocation has not been benchmarked for downstream quality after conversion.
- Multimodal prompt plumbing depends on future Step3p7 loader/runtime support.
- Behavior, benchmark scores, and deployment claims come from the upstream StepFun release; this quantization has not been independently re-benchmarked.
Credits
Thank you to both sides of this release:
| Quantization & release | osmAPI research team and Terv Student Research Team |
| Foundation model | StepFun, creators of stepfun-ai/Step-3.7-Flash |
License: Apache-2.0, following the upstream StepFun release.
- Downloads last month
- 42
Quantized
Model tree for osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx
Base model
stepfun-ai/Step-3.7-Flash