TOTORONG commited on
Commit
1927a97
·
verified ·
1 Parent(s): 8e1d242

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -14,6 +14,31 @@ This model [TOTORONG/LongCat-Flash-3.5bits](https://huggingface.co/TOTORONG/Long
14
  converted to MLX format from [meituan-longcat/LongCat-Flash-Chat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat)
15
  using mlx-lm version **0.27.1**.
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## Use with mlx
18
 
19
  ```bash
 
14
  converted to MLX format from [meituan-longcat/LongCat-Flash-Chat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat)
15
  using mlx-lm version **0.27.1**.
16
 
17
+
18
+ Quantization policy (by module)
19
+ Module / Tensor name pattern Bits Notes
20
+ LayerNorms: *layernorm*, input_layernorm, post_attention_layernorm fp16 (not quantized) Kept full precision for stability; negligible size share.
21
+ Router: mlp.router.classifier.* 8b Conservative to preserve expert routing fidelity.
22
+ Embeddings: embed_tokens.* 8b Vocabulary quality & calibration.
23
+ LM head: lm_head.* 8b Output logits stability & calibration.
24
+ Self-Attention Q/K/V: `.self_attn.(q_a q_b kv_a(_with_mqa)?
25
+ Self-Attention O-proj: .self_attn.o_proj.weight 4b → 6b on selected layers Higher precision on early/late/periodic layers to reduce accumulation error.
26
+ Switch-MLP experts: `.mlp.switch_mlp.(up gate down)_proj.weight`
27
+ Experts (per-block): `.mlps.<idx>.(up gate down)_proj.weight`
28
+ Everything else low_bits fallback Uses the converter’s low_bits default if not matched above.
29
+ “Selected layers” (the precision bump mask)
30
+ A layer is considered early/late/periodic if its index i (from model.layers.i) satisfies:
31
+ i < num_layers // 8 or
32
+ i >= 7 * num_layers // 8 or
33
+ (i - num_layers // 8) % 3 == 2
34
+ These layers receive:
35
+ Q/K/V: 3b → 4b
36
+ O-proj: 4b → 6b
37
+ Experts (.mlps.<idx>.*): 2b → 3b
38
+ Switch-MLP remains 3b across all layers.
39
+ This mask preserves prompt-sensitivity (front) and output stability (tail), with a periodic boost to reduce worst-case error accumulation.
40
+
41
+
42
  ## Use with mlx
43
 
44
  ```bash