Instructions to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx")
config = load_config("osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx

Run Hermes

hermes

OpenClaw new

How to use osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Configuration Parsing Warning:In config.json: "quantization_config.bits" must be a number

Step-3.7-Flash-OptiQ-3.7bpw-mlx

osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx is an Apple-Silicon MLX affine mixed-precision OptiQ tensor-format quantization of stepfun-ai/Step-3.7-Flash.

No fine-tuning, distillation, or retraining was applied. The upstream StepFun checkpoint was downloaded and verified locally. OptiQ stream/Frobenius sensitivity was used to allocate mixed bit widths at a 3.7 BPW target, then eligible text and vision .weight tensors were converted with MLX affine quantization. Tokenizer, chat template, custom Step3.7 Python modules, and non-quantized control tensors are preserved from the source release.

Public osmAPI Step-3.7-Flash Variants

Variant	Repository	Format	Notes
MXFP4 MLX	osmapi/Step-3.7-Flash-MXFP4-mlx	MLX MXFP4 safetensors	Public 4-bit microscaling tensor bundle
OptiQ 3.7bpw MLX	osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx	MLX affine mixed-precision safetensors	Public 3.7 BPW OptiQ tensor bundle

Compatibility Status

This upload is a standard MLX affine safetensors bundle, but it is not yet a drop-in mlx_lm.load(...) or mlx_vlm.load(...) model.

At conversion time, vanilla mlx-lm 0.31.3 and mlx-vlm 0.5.0 did not register model_type: step3p7. This repository is therefore intended for MLX runtime authors, loader implementers, and researchers who want a verified Step-3.7-Flash OptiQ tensor bundle. Native inference will require Step3p7 model-class support in MLX/MLX-LM/MLX-VLM or a compatible custom loader.

Model Details

Property	Value
Base model	`stepfun-ai/Step-3.7-Flash`
Architecture	Step3p7 sparse MoE vision-language model
Parameters	198B total, about 11B active per token
Context length	256k
Vision encoder	1.8B perception encoder, preserved in 2 vision shards
Local profile	`MLX-OptiQ-Affine-3.7bpw`
Bundle size	About 99 GB
Shards	24 text safetensors + 2 vision safetensors
Source license	Apache-2.0
Validation	Safetensors index validation, config metadata validation, manifest validation, MLX tensor sample loads

Quantization Recipe

Tensor class	Codec	Bits / handling
Eligible text and vision `.weight` tensors	MLX affine	OptiQ-assigned 3, 4, or 8 bits, group size 64
Quantized tensor layout	MLX triplet	`.weight`, `.scales`, `.biases`
Norms, biases, routing/control tensors, and incompatible tensors	passthrough	source precision preserved

OptiQ allocation summary:

Metric	Value
Target BPW	3.7
Achieved BPW	3.6930459517285583
Allocation method	`optiq_stream_frobenius`
Candidate bits	2, 3, 4, 8
Quantized weights	702
Passthrough tensors	769
Group size	64
3-bit allocations	66
4-bit allocations	62
8-bit allocations	574
Missing allocations	0

The achieved BPW is the OptiQ allocation target over quantized weights. The on-disk bundle also includes MLX affine scale/bias tensors, passthrough tensors, tokenizer/config/custom code, and index metadata.

Files

model-00001.safetensors to model-00024.safetensors: text/model shards in MLX affine mixed-precision tensor format.
model-vit-00001.safetensors and model-vit-00002.safetensors: vision encoder shards in MLX affine mixed-precision tensor format.
model.safetensors.index.json: rewritten safetensors index for quantized triplet tensors.
optiq_allocation.json: OptiQ per-layer bit allocation.
mlx_quantization_manifest.json: conversion manifest with quantized/passthrough tensor counts and tensor-level metadata.
config.json: upstream config with added MLX OptiQ quantization metadata.
configuration_step3p7.py, modeling_step3p7.py, processing_step3.py, vision_encoder.py: upstream custom Step3.7 code.
tokenizer.json, tokenizer_config.json, special_tokens_map.json, chat_template.jinja: upstream tokenizer and prompt assets.

Tensor Inspection

Until Step3p7 support lands in an MLX runtime, use MLX tensor loading for inspection or custom loader development:

import mlx.core as mx

tensors = mx.load("model-00002.safetensors")
prefix = "model.layers.3.moe.down_proj"

print(tensors[prefix + ".weight"].shape, tensors[prefix + ".weight"].dtype)
print(tensors[prefix + ".scales"].shape, tensors[prefix + ".scales"].dtype)
print(tensors[prefix + ".biases"].shape, tensors[prefix + ".biases"].dtype)

Representative local validation for that 3-bit tensor returned:

Tensor	Shape	Dtype
`model.layers.3.moe.down_proj.weight`	`(288, 4096, 120)`	`uint32`
`model.layers.3.moe.down_proj.scales`	`(288, 4096, 20)`	`bfloat16`
`model.layers.3.moe.down_proj.biases`	`(288, 4096, 20)`	`bfloat16`

Limitations

This is a tensor-format MLX affine mixed-precision conversion, not a complete native Step3p7 MLX inference implementation.
Current vanilla mlx-lm and mlx-vlm releases need Step3p7 architecture support before this can be used as a normal one-line load/generate model.
The OptiQ allocation has not been benchmarked for downstream quality after conversion.
Multimodal prompt plumbing depends on future Step3p7 loader/runtime support.
Behavior, benchmark scores, and deployment claims come from the upstream StepFun release; this quantization has not been independently re-benchmarked.

Credits

Thank you to both sides of this release:


Quantization & release	osmAPI research team and Terv Student Research Team
Foundation model	StepFun, creators of `stepfun-ai/Step-3.7-Flash`

License: Apache-2.0, following the upstream StepFun release.

Downloads last month: 592

Safetensors

Model size

30B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

Quantized

Model tree for osmapi/Step-3.7-Flash-OptiQ-3.7bpw-mlx

Base model

stepfun-ai/Step-3.7-Flash

Quantized

(36)

this model