ISTA-DASLab
/

Kimi-K2.5-2Bit-GSQ

Image-Text-to-Text

Mixture of Experts

Model card Files Files and versions

soroushtabesh commited on 1 day ago

Commit

1cf1eec

·

verified ·

1 Parent(s): 7a53d08

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -21,9 +21,9 @@ tags:
 # Kimi-K2.5 — 2-bit GSQ
 2-bit quantization of [`moonshotai/Kimi-K2.5`](https://huggingface.co/moonshotai/Kimi-K2.5)
-(MoE, 384 experts, ~260 GB FP) produced with **GSQ**
-(Gumbel-Softmax Quantization). The model is compressed from ~4.5 bpp down to
-**~2.13 bpp** while preserving most of the base model's reasoning, coding,
 and long-context behaviour — and slightly *exceeds* the FP base on MATH 500
 and LiveCodeBench v6 under our evaluation pipeline.
@@ -35,7 +35,7 @@ and LiveCodeBench v6 under our evaluation pipeline.
 ## Quantization details
 - **Base model:** [`moonshotai/Kimi-K2.5`](https://huggingface.co/moonshotai/Kimi-K2.5)
-- **Bits / weight (effective):** ~2.13 bpp
 - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
 - **Group size:** 128
 - **Format:** `compressed-tensors` (auto-detected by vLLM)

 # Kimi-K2.5 — 2-bit GSQ
 2-bit quantization of [`moonshotai/Kimi-K2.5`](https://huggingface.co/moonshotai/Kimi-K2.5)
+(MoE, 384 experts, ≈260 GB FP) produced with **GSQ**
+(Gumbel-Softmax Quantization). The model is compressed from ≈4.5 bpp down to
+**≈2.13 bpp** while preserving most of the base model's reasoning, coding,
 and long-context behaviour — and slightly *exceeds* the FP base on MATH 500
 and LiveCodeBench v6 under our evaluation pipeline.
 ## Quantization details
 - **Base model:** [`moonshotai/Kimi-K2.5`](https://huggingface.co/moonshotai/Kimi-K2.5)
+- **Bits / weight (effective):** ≈2.13 bpp
 - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
 - **Group size:** 128
 - **Format:** `compressed-tensors` (auto-detected by vLLM)