Qwen3-Coder-Next-mxfp4-mlx
The Qwen3-Coder-Next outperforms the previous Next models with ease.
The mxfp4 is head and shoulders above the old Next q8, establishing itself as the highest performing quant so far.
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
qx86n-hi 0.518,0.710,0.882,0.626,0.416,0.745,0.601
qx86n 0.515,0.712,0.881,0.627,0.414,0.744,0.590
mxfp8 0.514,0.709,0.884,0.639,0.420,0.748,0.611
mxfp4 0.528,0.713,0.880,0.630,0.428,0.744,0.619
qx64n-hi 0.527,0.707,0.880,0.631,0.426,0.744,0.580
qx64n 0.511,0.703,0.881,0.631,0.420,0.746,0.598
qx53n 0.520,0.714,0.872,0.630,0.438,0.744,0.599
Qwen3-Next-80B-A3B-Instruct
q8 0.402,0.494,0.896,0.540,0.420,0.754,0.554
Qwen3-Next-80B-A3B-Thinking
q8 0.409,0.459,0.648,0.655,0.376,0.783,0.692
Size Perplexity
qx86n-hi 82G 4.484 ± 0.033
qx86n 73G 4.487 ± 0.033
mxfp8 82G 4.537 ± 0.033
mxfp4 42G 4.676 ± 0.035
qx64n-hi 54G 4.528 ± 0.033
qx64n 53G 4.525 ± 0.033
qx53n 43G 4.750 ± 0.036
The Deckard(qx) formula for Next was used without changes from the previous Next series.
There are very few benefits from using group size 32 in high quants for Coder-Next, as it did not bring much to qx86n-hi(not being uploaded for space constraints).
At low quants however, the mxfp4 and qx64n-hi show highest combined arc, openbookqa, hellaswag and winogrande, even compared to larger quants.
The qx53n still works, has a bit better openbooka than the mxfp4, sufficient to matter in some use cases.
The mxfp8 seems unbeatable in speed, and its metrics are excellent, showing highest logic, hellaswag, and piqa.
Abliterated and REAP models
I benchmarked the mxfp4/mxfp8 for being the most stable quants with very little loss from full precision.
Qwen3-Coder-Next
mxfp8 0.514,0.709,0.884,0.639,0.420,0.748,0.611
Huihui-Qwen3-Coder-Next-abliterated
mxfp8 0.488,0.681,0.871,0.628,0.404,0.753,0.581
lovedheart/Qwen3-Coder-Next-REAP-40B-A3B
mxfp8 0.390,0.508,0.610,0.532,0.354,0.665,0.577
Perplexity
Huihui mxfp8 4.817 ± 0.036
Huihui mxfp4 4.946 ± 0.037
REAP-40B mxfp8 11.127 ± 0.103
REAP-40B mxfp4 11.479 ± 0.107
REAP-48B mxfp8 9.489 ± 0.085
REAP-48B mxfp4 9.676 ± 0.087
The REAP models seem much more cheerful than the original, but lose a lot of arc and boolq that shows as heavy hallucinations in the output.
Nightmedia models
Here are some Brainwaves for qx86-hi for Nightmedia 30B-A3B elements, to give an idea how much better Next could be
These tests have nothing to do with what the model knows, but how well it thinks with what it knows.
arc arc/e boolq hswag obkqa piqa wino
Element4 0.514,0.617,0.846,0.769,0.442,0.801,0.731
Element5 0.560,0.709,0.883,0.756,0.448,0.807,0.713
Element6 0.568,0.737,0.880,0.760,0.450,0.803,0.714
Element7 0.578,0.750,0.883,0.742,0.478,0.804,0.684
So cognitively, the Next has risen up to Element5 level or so.
And then there is the curious case of the Qwen3-4B-Engineer3x-qx86-hi-mlx
arc arc/e boolq hswag obkqa piqa wino
qx86-hi 0.615,0.835,0.852,0.745,0.420,0.780,0.704
We have other models in this range :)
-G
This model Qwen3-Coder-Next-mxfp4-mlx was converted to MLX format from Qwen/Qwen3-Coder-Next using mlx-lm version 0.30.6.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-Coder-Next-mxfp4-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 3,281
4-bit
Model tree for nightmedia/Qwen3-Coder-Next-mxfp4-mlx
Base model
Qwen/Qwen3-Coder-Next