spicyneuron commited on
Commit
ffde6d7
·
verified ·
1 Parent(s): f6423b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md CHANGED
@@ -7,3 +7,15 @@ base_model: Qwen/Qwen3-Coder-Next
7
  tags:
8
  - mlx
9
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  tags:
8
  - mlx
9
  ---
10
+
11
+ [Qwen3-Coder-Next](https://huggingface.co/moonshotai/Qwen/Qwen3-Coder-Next) optimized for MLX. Note: Uses MXFP4 for some module paths.
12
+
13
+ # Methodology
14
+
15
+ Quantized using a custom script inspired by Unsloth-style mixed-precision GGUFs. MLX quantization options differ
16
+ than llama.cpp, but the principles are the same:
17
+ - Sensitive layers like MoE routing, attention, and output embeddings get higher precision
18
+ - More tolerant layers like MoE experts get lower precision
19
+
20
+ This one is comparable to [Unsloth's UD-Q4_K_XL](https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/blob/main/Qwen3-Coder-Next-UD-Q4_K_XL.gguf)
21
+ in size, but loads and runs noticeably faster thanks to MLX.