TOTORONG commited on
Commit
fc51b27
·
verified ·
1 Parent(s): 00c8eb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -14,20 +14,20 @@ This model [TOTORONG/LongCat-Flash-3.5bits](https://huggingface.co/TOTORONG/Long
14
  converted to MLX format from [meituan-longcat/LongCat-Flash-Chat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat)
15
  using mlx-lm version **0.27.1**.
16
 
17
- Quantized model with 3.516 bits per weight to fit M3 Ultra 256GB
18
-
19
- Selected layers” (the precision bump mask)
20
- A layer is considered early/late/periodic if its index i (from model.layers.i) satisfies:
21
- i < num_layers // 8 or
22
- i >= 7 * num_layers // 8 or
23
- (i - num_layers // 8) % 3 == 2
24
-
25
- These layers receive:
26
- Q/K/V: 3b → 4b
27
- O-proj: 4b → 6b
28
- Experts (.mlps.<idx>.*): 2b → 3b
29
- Switch-MLP remains 3b across all layers.
30
- This mask preserves prompt-sensitivity (front) and output stability (tail), with a periodic boost to reduce worst-case error accumulation.
31
 
32
 
33
  ## Use with mlx
 
14
  converted to MLX format from [meituan-longcat/LongCat-Flash-Chat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat)
15
  using mlx-lm version **0.27.1**.
16
 
17
+ #Quantized model with 3.516 bits per weight to fit M3 Ultra 256GB
18
+
19
+ #“Selected layers” (the precision bump mask)
20
+ #A layer is considered early/late/periodic if its index i (from model.layers.i) satisfies:
21
+ #i < num_layers // 8 or
22
+ #i >= 7 * num_layers // 8 or
23
+ #(i - num_layers // 8) % 3 == 2
24
+
25
+ #These layers receive:
26
+ #Q/K/V: 3b → 4b
27
+ #O-proj: 4b → 6b
28
+ #Experts (.mlps.<idx>.*): 2b → 3b
29
+ #Switch-MLP remains 3b across all layers.
30
+ #This mask preserves prompt-sensitivity (front) and output stability (tail), with a periodic boost to reduce worst-case error accumulation.
31
 
32
 
33
  ## Use with mlx