Instructions to use mlx-community/Qwen3-Coder-Next-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlx-community/Qwen3-Coder-Next-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Qwen3-Coder-Next-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use mlx-community/Qwen3-Coder-Next-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/Qwen3-Coder-Next-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "mlx-community/Qwen3-Coder-Next-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use mlx-community/Qwen3-Coder-Next-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/Qwen3-Coder-Next-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default mlx-community/Qwen3-Coder-Next-4bit

Run Hermes

hermes

MLX LM

How to use mlx-community/Qwen3-Coder-Next-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "mlx-community/Qwen3-Coder-Next-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "mlx-community/Qwen3-Coder-Next-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "mlx-community/Qwen3-Coder-Next-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Model Loading Error: 1084 parameters not in model (.biases/.scales)

by pipipata - opened Feb 9

Discussion

pipipata

Feb 9

Summary

I'm encountering a model loading error with Qwen3-Coder-Next-4bit that appears similar to the pixtral-12b-8bit quantization issue reported previously.

Error

ValueError: Received 1084 parameters not in model:
lm_head.biases,
lm_head.scales,
model.embed_tokens.biases,
model.embed_tokens.scales,
model.layers.0.linear_attn.in_proj_ba.biases,
model.layers.0.linear_attn.in_proj_ba.scales,
[... and 1078 more similar .biases/.scales parameters ...]

The server crashes at layer 9 during model loading. All 9 safetensor files are present and complete (~43GB total).

Reproduction

python3 -m mlx_lm.server \
  --model /path/to/qwen3-coder-next-80b-a3b-moe-coder-2507-4bit-mlx-sft-256k-exp512 \
  --port 8201 \
  --host 127.0.0.1 \
  --max-tokens 2048

Server loads 8 layers successfully, then crashes with the ValueError above.

Environment

Model: mlx-community/Qwen3-Coder-Next-4bit (qwen3-coder-next-80b variant)
MLX Version: 0.30.5 (from model README)
Model Size: ~43GB (512 experts × 48 layers MoE)
Platform: macOS, Apple Silicon
Model Date: Downloaded Feb 3-7, 2026 (very recent)

Analysis

This looks like the kind of problem from the pixtral-12b-8bit issue where quantization parameters (.biases and .scales) were included in the model files but not expected by the MLX architecture. The error shows 1084 parameters across all layers that have these quantization artifacts.

I'm not certain if this is the exact same root cause, but the symptoms appear very similar:

Quantization-related parameters (.biases, .scales)
Parameters present in safetensors but not in model architecture
Recent model conversion (mlx-lm 0.30.5)

Question

Is this a known issue with the Qwen3-Coder-Next conversion? I couldn't find any other reports of this specific problem. If it's the same quantization bug that affected pixtral, is there a timeline for a fix?

I could be mistaken about the root cause - any guidance would be appreciated.

Files Verified

All expected files are present:

✅ config.json (512 experts, 48 layers)
✅ model-00001-of-00009.safetensors through model-00009-of-00009.safetensors
✅ tokenizer.json, tokenizer_config.json
✅ README.md (shows mlx-lm 0.30.5 conversion)

Thank you for maintaining these MLX conversions!

gitguffaw

Feb 16

Same. Thanks for you help!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment