spicyneuron commited on
Commit
01a490c
·
verified ·
1 Parent(s): 462b773

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -14,6 +14,14 @@ tags:
14
 
15
  **EDIT:** [v3](https://huggingface.co/spicyneuron/Qwen3-Next-Coder-MLX-mixed-4.5-bit/tree/v3) bumps edge experts to Q8 for further perplexity improvement and minimal effect on speed.
16
 
 
 
 
 
 
 
 
 
17
  # Methodology
18
 
19
  Quantized using a custom script inspired by Unsloth/AesSedai/ubergarm style mixed-precision GGUFs.
@@ -26,14 +34,6 @@ This one is comparable to
26
  [Unsloth's MOE-MXFP4](https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/blob/main/Qwen3-Coder-Next-MXFP4_MOE.gguf)
27
  in size, but loads and runs noticeably faster thanks to MLX.
28
 
29
- # Usage
30
-
31
- ```sh
32
- # Start server at http://localhost:8080/v1/chat/completions
33
- uvx --from mlx-lm mlx_lm.server --host 127.0.0.1 --port 8080 \
34
- --model spicyneuron/Qwen3-Next-Coder-MLX-mixed-4.5-bit
35
- ```
36
-
37
  # Benchmarks
38
 
39
  - unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL
 
14
 
15
  **EDIT:** [v3](https://huggingface.co/spicyneuron/Qwen3-Next-Coder-MLX-mixed-4.5-bit/tree/v3) bumps edge experts to Q8 for further perplexity improvement and minimal effect on speed.
16
 
17
+ # Usage
18
+
19
+ ```sh
20
+ # Start server at http://localhost:8080/v1/chat/completions
21
+ uvx --from mlx-lm mlx_lm.server --host 127.0.0.1 --port 8080 \
22
+ --model spicyneuron/Qwen3-Next-Coder-MLX-mixed-4.5-bit
23
+ ```
24
+
25
  # Methodology
26
 
27
  Quantized using a custom script inspired by Unsloth/AesSedai/ubergarm style mixed-precision GGUFs.
 
34
  [Unsloth's MOE-MXFP4](https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/blob/main/Qwen3-Coder-Next-MXFP4_MOE.gguf)
35
  in size, but loads and runs noticeably faster thanks to MLX.
36
 
 
 
 
 
 
 
 
 
37
  # Benchmarks
38
 
39
  - unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL