soroushtabesh commited on
Commit
1cf1eec
·
verified ·
1 Parent(s): 7a53d08

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -21,9 +21,9 @@ tags:
21
  # Kimi-K2.5 — 2-bit GSQ
22
 
23
  2-bit quantization of [`moonshotai/Kimi-K2.5`](https://huggingface.co/moonshotai/Kimi-K2.5)
24
- (MoE, 384 experts, ~260 GB FP) produced with **GSQ**
25
- (Gumbel-Softmax Quantization). The model is compressed from ~4.5 bpp down to
26
- **~2.13 bpp** while preserving most of the base model's reasoning, coding,
27
  and long-context behaviour — and slightly *exceeds* the FP base on MATH 500
28
  and LiveCodeBench v6 under our evaluation pipeline.
29
 
@@ -35,7 +35,7 @@ and LiveCodeBench v6 under our evaluation pipeline.
35
  ## Quantization details
36
 
37
  - **Base model:** [`moonshotai/Kimi-K2.5`](https://huggingface.co/moonshotai/Kimi-K2.5)
38
- - **Bits / weight (effective):** ~2.13 bpp
39
  - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
40
  - **Group size:** 128
41
  - **Format:** `compressed-tensors` (auto-detected by vLLM)
 
21
  # Kimi-K2.5 — 2-bit GSQ
22
 
23
  2-bit quantization of [`moonshotai/Kimi-K2.5`](https://huggingface.co/moonshotai/Kimi-K2.5)
24
+ (MoE, 384 experts, 260 GB FP) produced with **GSQ**
25
+ (Gumbel-Softmax Quantization). The model is compressed from 4.5 bpp down to
26
+ **2.13 bpp** while preserving most of the base model's reasoning, coding,
27
  and long-context behaviour — and slightly *exceeds* the FP base on MATH 500
28
  and LiveCodeBench v6 under our evaluation pipeline.
29
 
 
35
  ## Quantization details
36
 
37
  - **Base model:** [`moonshotai/Kimi-K2.5`](https://huggingface.co/moonshotai/Kimi-K2.5)
38
+ - **Bits / weight (effective):** 2.13 bpp
39
  - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
40
  - **Group size:** 128
41
  - **Format:** `compressed-tensors` (auto-detected by vLLM)