ISTA-DASLab
/

Kimi-K2.6-2Bit-GSQ

Image-Text-to-Text

Mixture of Experts

Model card Files Files and versions

soroushtabesh commited on 1 day ago

Commit

e1cc914

·

verified ·

1 Parent(s): 405a1c8

Add storage layout details

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ group-wise scalar format that drops into existing INT inference kernels.
 - **Bits / weight (effective):** ≈2.13 bpp
 - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
 - **Group size:** 128
-- **Format:** [Humming](https://github.com/IST-DASLab/humming) (`quant_method: "humming"`, `b_dtype: "uint2"`)
 - **Pipeline:** GPTQ initialization → Gumbel-Softmax refinement (Lion optimizer)
 - **What's quantized:** routed-expert MLPs from layer 1 onward (`gate_proj`, `up_proj`, `down_proj`). Attention (`self_attn`), layernorms, embeddings, LM head, vision tower, MM projector, MoE routing `gate`, shared experts, and the first dense MLP layer (`layers.0.mlp.*`) are kept in BF16.
@@ -83,7 +83,7 @@ weight is `2 bits (packed) + 16 bits / 128 (group scale) ≈ 2.13 bpp`. The
 ```
 Loading this checkpoint requires a vLLM build with the
-[`humming`](https://github.com/IST-DASLab/humming) MoE kernel installed (see
 the [GSQ repo](https://github.com/IST-DASLab/GSQ) `scripts/setup_env.sh` for
 the exact install line).

 - **Bits / weight (effective):** ≈2.13 bpp
 - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
 - **Group size:** 128
+- **Format:** [Humming](https://github.com/inclusionAI/humming) (`quant_method: "humming"`, `b_dtype: "uint2"`)
 - **Pipeline:** GPTQ initialization → Gumbel-Softmax refinement (Lion optimizer)
 - **What's quantized:** routed-expert MLPs from layer 1 onward (`gate_proj`, `up_proj`, `down_proj`). Attention (`self_attn`), layernorms, embeddings, LM head, vision tower, MM projector, MoE routing `gate`, shared experts, and the first dense MLP layer (`layers.0.mlp.*`) are kept in BF16.
 ```
 Loading this checkpoint requires a vLLM build with the
+[`humming`](https://github.com/inclusionAI/humming) MoE kernel installed (see
 the [GSQ repo](https://github.com/IST-DASLab/GSQ) `scripts/setup_env.sh` for
 the exact install line).