Instructions to use nightmedia/Qwen3-Coder-Next-mxfp4-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nightmedia/Qwen3-Coder-Next-mxfp4-mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("nightmedia/Qwen3-Coder-Next-mxfp4-mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use nightmedia/Qwen3-Coder-Next-mxfp4-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/Qwen3-Coder-Next-mxfp4-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nightmedia/Qwen3-Coder-Next-mxfp4-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nightmedia/Qwen3-Coder-Next-mxfp4-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/Qwen3-Coder-Next-mxfp4-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nightmedia/Qwen3-Coder-Next-mxfp4-mlx

Run Hermes

hermes

MLX LM

How to use nightmedia/Qwen3-Coder-Next-mxfp4-mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "nightmedia/Qwen3-Coder-Next-mxfp4-mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "nightmedia/Qwen3-Coder-Next-mxfp4-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "nightmedia/Qwen3-Coder-Next-mxfp4-mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Qwen3-Coder-Next-mxfp4-mlx

The Qwen3-Coder-Next outperforms the previous Next models with ease.

The mxfp4 is head and shoulders above the old Next q8, establishing itself as the highest performing quant so far.

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
qx86n-hi 0.518,0.710,0.882,0.626,0.416,0.745,0.601
qx86n    0.515,0.712,0.881,0.627,0.414,0.744,0.590
mxfp8    0.514,0.709,0.884,0.639,0.420,0.748,0.611
mxfp4    0.528,0.713,0.880,0.630,0.428,0.744,0.619
qx64n-hi 0.527,0.707,0.880,0.631,0.426,0.744,0.580
qx64n    0.511,0.703,0.881,0.631,0.420,0.746,0.598
qx53n    0.520,0.714,0.872,0.630,0.438,0.744,0.599

Qwen3-Next-80B-A3B-Instruct
q8       0.402,0.494,0.896,0.540,0.420,0.754,0.554

Qwen3-Next-80B-A3B-Thinking
q8       0.409,0.459,0.648,0.655,0.376,0.783,0.692

         Size  Perplexity
qx86n-hi 82G   4.484 ± 0.033
qx86n    73G   4.487 ± 0.033
mxfp8    82G   4.537 ± 0.033
mxfp4    42G   4.676 ± 0.035
qx64n-hi 54G   4.528 ± 0.033
qx64n    53G   4.525 ± 0.033
qx53n    43G   4.750 ± 0.036

The Deckard(qx) formula for Next was used without changes from the previous Next series.

There are very few benefits from using group size 32 in high quants for Coder-Next, as it did not bring much to qx86n-hi(not being uploaded for space constraints).

At low quants however, the mxfp4 and qx64n-hi show highest combined arc, openbookqa, hellaswag and winogrande, even compared to larger quants.

The qx53n still works, has a bit better openbooka than the mxfp4, sufficient to matter in some use cases.

The mxfp8 seems unbeatable in speed, and its metrics are excellent, showing highest logic, hellaswag, and piqa.

Abliterated and REAP models

I benchmarked the mxfp4/mxfp8 for being the most stable quants with very little loss from full precision.

Qwen3-Coder-Next
mxfp8  0.514,0.709,0.884,0.639,0.420,0.748,0.611

Huihui-Qwen3-Coder-Next-abliterated
mxfp8  0.488,0.681,0.871,0.628,0.404,0.753,0.581

lovedheart/Qwen3-Coder-Next-REAP-40B-A3B
mxfp8  0.390,0.508,0.610,0.532,0.354,0.665,0.577

Perplexity

Huihui    mxfp8  4.817 ± 0.036
Huihui    mxfp4  4.946 ± 0.037

REAP-40B  mxfp8  11.127 ± 0.103
REAP-40B  mxfp4  11.479 ± 0.107

REAP-48B  mxfp8  9.489 ± 0.085
REAP-48B  mxfp4  9.676 ± 0.087

The REAP models seem much more cheerful than the original, but lose a lot of arc and boolq that shows as heavy hallucinations in the output.

Nightmedia models

Here are some Brainwaves for qx86-hi for Nightmedia 30B-A3B elements, to give an idea how much better Next could be

These tests have nothing to do with what the model knows, but how well it thinks with what it knows.

           arc   arc/e boolq hswag obkqa piqa  wino
Element4   0.514,0.617,0.846,0.769,0.442,0.801,0.731
Element5   0.560,0.709,0.883,0.756,0.448,0.807,0.713
Element6   0.568,0.737,0.880,0.760,0.450,0.803,0.714
Element7   0.578,0.750,0.883,0.742,0.478,0.804,0.684

So cognitively, the Next has risen up to Element5 level or so.

And then there is the curious case of the Qwen3-4B-Engineer3x-qx86-hi-mlx

           arc   arc/e boolq hswag obkqa piqa  wino
qx86-hi    0.615,0.835,0.852,0.745,0.420,0.780,0.704

We have other models in this range :)

-G

This model Qwen3-Coder-Next-mxfp4-mlx was converted to MLX format from Qwen/Qwen3-Coder-Next using mlx-lm version 0.30.6.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Qwen3-Coder-Next-mxfp4-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Downloads last month: 1,007

Safetensors

Model size

80B params

Tensor type

U32

BF16

MLX

Hardware compatibility

4-bit

Model tree for nightmedia/Qwen3-Coder-Next-mxfp4-mlx

Base model

Qwen/Qwen3-Coder-Next

Quantized

(97)

this model

Collections including nightmedia/Qwen3-Coder-Next-mxfp4-mlx