ISTA-DASLab
/

Kimi-K2.6-2Bit-GSQ

Image-Text-to-Text

Mixture of Experts

Model card Files Files and versions

soroushtabesh commited on 12 days ago

Commit

1802d56

·

verified ·

1 Parent(s): e1cc914

Add humming instructions

Files changed (1) hide show

README.md +9 -4

README.md CHANGED Viewed

@@ -82,10 +82,9 @@ weight is `2 bits (packed) + 16 bits / 128 (group scale) ≈ 2.13 bpp`. The
 }
 ```
-Loading this checkpoint requires a vLLM build with the
-[`humming`](https://github.com/inclusionAI/humming) MoE kernel installed (see
-the [GSQ repo](https://github.com/IST-DASLab/GSQ) `scripts/setup_env.sh` for
-the exact install line).
 > Note: GSQ training first writes shards in `compressed-tensors`
 > `pack-quantized` format (where the 2-bit codebook is padded into a 4-bit
@@ -95,6 +94,12 @@ the exact install line).
 ## Serving with vLLM
 Hopper (sm_90) or Ampere (sm ≥ 80) GPUs required for serving.
 ```bash

 }
 ```
+Loading this checkpoint requires vLLM plus the
+[`humming`](https://github.com/inclusionAI/humming) MoE kernels (`pip install
+humming-kernels`). See **Serving with vLLM** below.
 > Note: GSQ training first writes shards in `compressed-tensors`
 > `pack-quantized` format (where the 2-bit codebook is padded into a 4-bit
 ## Serving with vLLM
+Install the Humming kernels (required for vLLM to load this checkpoint):
+```bash
+pip install humming-kernels
+```
 Hopper (sm_90) or Ampere (sm ≥ 80) GPUs required for serving.
 ```bash