This is a text-only GGUF quantization of moonshotai/Kimi-K2.5. This means that the vision tower and image input is not present in this GGUF, and will not be available until support is added upstream in llama.cpp.

To produce this quant, I modified the config.json to remove the text_config key in the config and de-indented and deduplicated the inner values for that so they sit at the top level of the JSON, updated the arch to DS and updated the model_type to "kimi_k2". I also removed the mm and vision_tower entries from the model.safetensor.index.json.

I used jukofyork's instructions here to modify llama.cpp and recompiled, decompressed the HF safetensors files into a BF16 gguf, then produced a quant:

./build/bin/llama-quantize \
    --tensor-type attn_kv_a_mqa=q8_0 \
    --tensor-type attn_k_b=q8_0 \
    --tensor-type attn_v_b=q8_0 \
    --tensor-type _exps=q4_0 \
    Kimi-K2.5-BF16.gguf Kimi-K2.5-Q4_X.gguf Q8_0

This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.

Quant Size Mixture PPL Uploaded?
Q4_X 543.62 GiB (4.55 BPW) Q8_0 / Q4_0 1.8248 +/- 0.00699 โœ…
Q4_K 543.62 GiB (4.55 BPW) Q8_0 / Q4_K 1.8256 +/- 0.00700 โŒ

I'll update this repo later with the quant PPL but I have loaded and tested the quant to check that it looks like it is sane and working properly. There are no special restrictions or PRs needed to run this quant, if you can run Kimi-K2 variants already then this should just work for you.

Downloads last month
22
GGUF
Model size
1T params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AesSedai/Kimi-K2.5

Quantized
(7)
this model