Claude Code in a Box
Collection
Local models for replacing Claude Code with a Mac Studio. Easy to use with https://github.com/musistudio/claude-code-router • 7 items • Updated
• 1
Qwen3-Coder-Next optimized for MLX. Note: Uses MXFP4 for some module paths.
EDIT: v2 fixes some misassigned shared expert gates. Slower, but with 4x better perplexity.
EDIT: v3 bumps edge experts to Q8 for further perplexity improvement and minimal effect on speed.
# Start server at http://localhost:8080/v1/chat/completions
uvx --from mlx-lm mlx_lm.server --host 127.0.0.1 --port 8080 \
--model spicyneuron/Qwen3-Next-Coder-MLX-mixed-4.5-bit
Quantized using a custom script inspired by Unsloth/AesSedai/ubergarm style mixed-precision GGUFs. MLX quantization options differ than llama.cpp, but the principles are the same:
This one is comparable to
Unsloth's UD-Q4_K_XL
Unsloth's MOE-MXFP4
in size, but loads and runs noticeably faster thanks to MLX.
| Prompt Size | GGUF | MLX 4bit | MLX 4.5bit (v1) | MLX 4.4bit (v2) | MLX 4.9bit (v3) |
|---|---|---|---|---|---|
| 1000 | 1440.60 | 1917.29 | 1894.38 | 1871.55 | 1868.77 |
| 5000 | 1511.29 | 2113.98 | 2069.36 | 2079.87 | 2071.76 |
| 10000 | 1491.41 | 2073.89 | 2032.13 | 2039.11 | 2031.04 |
| 20000 | 1387.15 | 1888.56 | 1854.83 | 1860.35 | 1854.24 |
| Gen Size | GGUF | MLX 4bit | MLX 4.5b (v1) | MLX 4.4b (v2) | MLX 4.9b (v3) |
|---|---|---|---|---|---|
| 500 | 49.35 | 76.39 | 75.30 | 66.82 | 67.19 |
| 1000 | 49.12 | 74.67 | 73.16 | 65.86 | 64.82 |
| 2000 | 49.01 | 71.99 | 70.95 | 63.68 | 62.82 |
| 5000 | 48.64 | 67.72 | 66.67 | 61.04 | 60.99 |
| Model | Perplexity | Relative | Relative % |
|---|---|---|---|
| MLX 4bit | 4.118 ± 0.021 | — | — |
| MLX 4.5bit (v1) | 4.096 ± 0.021 | -0.022 | -0.53% |
| MLX 4.4bit (v2) | 4.024 ± 0.021 | -0.094 | -2.28% |
| MLX 4.9bit (v3) | 4.016 ± 0.021 | -0.102 | -2.48% |
# llama.cpp 8130
llama-bench -fa 1 --batch-size 2048 --ubatch-size 2048 --repetitions 5
# mlx_lm v0.30.7
mlx_lm.benchmark --num-trials 5
mlx_lm.perplexity --sequence-length 1000 --seed 222
4-bit
Base model
Qwen/Qwen3-Coder-Next