soroushtabesh commited on
Commit
81de335
·
verified ·
1 Parent(s): 98ce94f

Add storage layout details

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -38,7 +38,7 @@ and LiveCodeBench v6 under our evaluation pipeline.
38
  - **Bits / weight (effective):** ≈2.13 bpp
39
  - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
40
  - **Group size:** 128
41
- - **Format:** [Humming](https://github.com/IST-DASLab/humming) (`quant_method: "humming"`, `b_dtype: "uint2"`)
42
  - **Pipeline:** GPTQ initialization → Gumbel-Softmax refinement (Lion optimizer)
43
  - **What's quantized:** routed-expert MLPs from layer 1 onward (`gate_proj`, `up_proj`, `down_proj`). Attention (`self_attn`), layernorms, embeddings, LM head, vision tower, MM projector, MoE routing `gate`, shared experts, and the first dense MLP layer (`layers.0.mlp.*`) are kept in BF16.
44
 
@@ -85,7 +85,7 @@ weight is `2 bits (packed) + 16 bits / 128 (group scale) ≈ 2.13 bpp`. The
85
  ```
86
 
87
  Loading this checkpoint requires a vLLM build with the
88
- [`humming`](https://github.com/IST-DASLab/humming) MoE kernel installed (see
89
  the [GSQ repo](https://github.com/IST-DASLab/GSQ) `scripts/setup_env.sh` for
90
  the exact install line).
91
 
 
38
  - **Bits / weight (effective):** ≈2.13 bpp
39
  - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
40
  - **Group size:** 128
41
+ - **Format:** [Humming](https://github.com/inclusionAI/humming) (`quant_method: "humming"`, `b_dtype: "uint2"`)
42
  - **Pipeline:** GPTQ initialization → Gumbel-Softmax refinement (Lion optimizer)
43
  - **What's quantized:** routed-expert MLPs from layer 1 onward (`gate_proj`, `up_proj`, `down_proj`). Attention (`self_attn`), layernorms, embeddings, LM head, vision tower, MM projector, MoE routing `gate`, shared experts, and the first dense MLP layer (`layers.0.mlp.*`) are kept in BF16.
44
 
 
85
  ```
86
 
87
  Loading this checkpoint requires a vLLM build with the
88
+ [`humming`](https://github.com/inclusionAI/humming) MoE kernel installed (see
89
  the [GSQ repo](https://github.com/IST-DASLab/GSQ) `scripts/setup_env.sh` for
90
  the exact install line).
91