This is a text-only GGUF quantization of moonshotai/Kimi-K2.5. This means that the vision tower and image input is not present in this GGUF, and will not be available until support is added upstream in llama.cpp.
To produce this quant, I modified the config.json to remove the text_config key in the config and de-indented and deduplicated the inner values for that so they sit at the top level of the JSON, updated the arch to DS and updated the model_type to "kimi_k2". I also removed the mm and vision_tower entries from the model.safetensor.index.json.
I used jukofyork's instructions here to modify llama.cpp and recompiled, decompressed the HF safetensors files into a BF16 gguf, then produced a quant:
./build/bin/llama-quantize \
--tensor-type attn_kv_a_mqa=q8_0 \
--tensor-type attn_k_b=q8_0 \
--tensor-type attn_v_b=q8_0 \
--tensor-type _exps=q4_0 \
Kimi-K2.5-BF16.gguf Kimi-K2.5-Q4_X.gguf Q8_0
This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.
| Quant | Size | Mixture | PPL | Uploaded? |
|---|---|---|---|---|
| Q4_X | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_0 | 1.8248 +/- 0.00699 | โ |
| Q4_K | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_K | 1.8256 +/- 0.00700 | โ |
I'll update this repo later with the quant PPL but I have loaded and tested the quant to check that it looks like it is sane and working properly. There are no special restrictions or PRs needed to run this quant, if you can run Kimi-K2 variants already then this should just work for you.
- Downloads last month
- 22
We're not able to determine the quantization variants.
Model tree for AesSedai/Kimi-K2.5
Base model
moonshotai/Kimi-K2.5