Qwen3.6-27B-mtp (affine-8)

MLX conversion of Qwen/Qwen3.6-27B, affine 8-bit (group_size 64), with the native Multi-Token-Prediction (MTP) head embedded in the main weight shards for native speculative decoding.

What changed from the previous packaging

This release replaces the previous mxfp8 packaging, which produced a ~29% slowdown when --mtp was enabled. Two compounding causes:

  1. mxfp8 backbone: the speculative verify pass processes 2 tokens through the full backbone every step. mxfp8's per-call dequant overhead does not amortize across that path on Apple Silicon — re-quantizing the same artifact to affine-8 reduced the regression from 29% to 10%.
  2. Lossy round-trip in the original packaging: even after re-quantization, the MTP head's calibration was damaged enough that MTP remained a net regression. A fresh affine-8 conversion straight from Qwen/Qwen3.6-27B (this artifact) restores +48% with --mtp.

This release is therefore a fresh affine-8 conversion from the upstream Qwen base, not a re-packaging of the previous mxfp8 artifact.

Conversion command

mlx_lm.convert --hf-path Qwen/Qwen3.6-27B \
  --mlx-path Qwen3.6-27B-mtp \
  -q --q-mode affine --q-bits 8 --q-group-size 64

Run

Without MTP (stock mlx-lm from PyPI):

mlx_lm.generate --model trevon/Qwen3.6-27B-mtp \
  --prompt "..." --max-tokens 100

With MTP (AirRunner feat/mtp-native, PR 990):

git clone https://github.com/AirRunner/mlx-lm.git
cd mlx-lm && git checkout feat/mtp-native
uv venv && uv pip install -e .
mlx_lm.generate --model trevon/Qwen3.6-27B-mtp \
  --prompt "..." --max-tokens 100 --mtp

Benchmarks (Apple M4 Max)

Mode tokens/sec
no --mtp 15.1
--mtp 22.4 (+48%)

License: per upstream Qwen/Qwen3.6-27B.

Downloads last month
2,132
Safetensors
Model size
27B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for trevon/Qwen3.6-27B-mtp

Base model

Qwen/Qwen3.6-27B
Quantized
(435)
this model