soroushtabesh commited on
Commit
e1cc914
·
verified ·
1 Parent(s): 405a1c8

Add storage layout details

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -36,7 +36,7 @@ group-wise scalar format that drops into existing INT inference kernels.
36
  - **Bits / weight (effective):** ≈2.13 bpp
37
  - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
38
  - **Group size:** 128
39
- - **Format:** [Humming](https://github.com/IST-DASLab/humming) (`quant_method: "humming"`, `b_dtype: "uint2"`)
40
  - **Pipeline:** GPTQ initialization → Gumbel-Softmax refinement (Lion optimizer)
41
  - **What's quantized:** routed-expert MLPs from layer 1 onward (`gate_proj`, `up_proj`, `down_proj`). Attention (`self_attn`), layernorms, embeddings, LM head, vision tower, MM projector, MoE routing `gate`, shared experts, and the first dense MLP layer (`layers.0.mlp.*`) are kept in BF16.
42
 
@@ -83,7 +83,7 @@ weight is `2 bits (packed) + 16 bits / 128 (group scale) ≈ 2.13 bpp`. The
83
  ```
84
 
85
  Loading this checkpoint requires a vLLM build with the
86
- [`humming`](https://github.com/IST-DASLab/humming) MoE kernel installed (see
87
  the [GSQ repo](https://github.com/IST-DASLab/GSQ) `scripts/setup_env.sh` for
88
  the exact install line).
89
 
 
36
  - **Bits / weight (effective):** ≈2.13 bpp
37
  - **Codebook:** 2-bit symmetric scalar `{-2, -1, 0, +1} × scale`
38
  - **Group size:** 128
39
+ - **Format:** [Humming](https://github.com/inclusionAI/humming) (`quant_method: "humming"`, `b_dtype: "uint2"`)
40
  - **Pipeline:** GPTQ initialization → Gumbel-Softmax refinement (Lion optimizer)
41
  - **What's quantized:** routed-expert MLPs from layer 1 onward (`gate_proj`, `up_proj`, `down_proj`). Attention (`self_attn`), layernorms, embeddings, LM head, vision tower, MM projector, MoE routing `gate`, shared experts, and the first dense MLP layer (`layers.0.mlp.*`) are kept in BF16.
42
 
 
83
  ```
84
 
85
  Loading this checkpoint requires a vLLM build with the
86
+ [`humming`](https://github.com/inclusionAI/humming) MoE kernel installed (see
87
  the [GSQ repo](https://github.com/IST-DASLab/GSQ) `scripts/setup_env.sh` for
88
  the exact install line).
89