TOTORONG
/

LongCat-Flash-3.5bits

Text Generation

4-bit precision

Model card Files Files and versions

TOTORONG commited on Sep 7, 2025

Commit

fc51b27

·

verified ·

1 Parent(s): 00c8eb0

Update README.md

Files changed (1) hide show

README.md +14 -14

README.md CHANGED Viewed

@@ -14,20 +14,20 @@ This model [TOTORONG/LongCat-Flash-3.5bits](https://huggingface.co/TOTORONG/Long
 converted to MLX format from [meituan-longcat/LongCat-Flash-Chat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat)
 using mlx-lm version **0.27.1**.
-Quantized model with 3.516 bits per weight to fit M3 Ultra 256GB
-“Selected layers” (the precision bump mask)
-A layer is considered early/late/periodic if its index i (from model.layers.i) satisfies:
-i < num_layers // 8 or
-i >= 7 * num_layers // 8 or
-(i - num_layers // 8) % 3 == 2
-These layers receive:
-Q/K/V: 3b → 4b
-O-proj: 4b → 6b
-Experts (.mlps.<idx>.*): 2b → 3b
-Switch-MLP remains 3b across all layers.
-This mask preserves prompt-sensitivity (front) and output stability (tail), with a periodic boost to reduce worst-case error accumulation.
 ## Use with mlx

 converted to MLX format from [meituan-longcat/LongCat-Flash-Chat](https://huggingface.co/meituan-longcat/LongCat-Flash-Chat)
 using mlx-lm version **0.27.1**.
+#Quantized model with 3.516 bits per weight to fit M3 Ultra 256GB
+#“Selected layers” (the precision bump mask)
+#A layer is considered early/late/periodic if its index i (from model.layers.i) satisfies:
+#i < num_layers // 8 or
+#i >= 7 * num_layers // 8 or
+#(i - num_layers // 8) % 3 == 2
+#These layers receive:
+#Q/K/V: 3b → 4b
+#O-proj: 4b → 6b
+#Experts (.mlps.<idx>.*): 2b → 3b
+#Switch-MLP remains 3b across all layers.
+#This mask preserves prompt-sensitivity (front) and output stability (tail), with a periodic boost to reduce worst-case error accumulation.
 ## Use with mlx