Will there be a 2.8bit quant?

#1
by darthsider - opened

spicyneuron/Kimi-K2.5-MLX-2.8bit fits well on M3 Ultra 512 GB with full context. So it’s the perfect size. Would you be doing 2.8bit for Kimi-K2.6 as well?

Stay tuned! Kimi trials take a bit longer since I can only fit 3 versions on a 4 TB external SSD (the dequantized model alone is 2 TB+).

So far, experiments in the 2.9 - 3.3 bpw range had significantly higher KL divergence. Still searching for the best tradeoff.

darthsider changed discussion status to closed

much appreciated

Uploading a 430 GB version now: https://huggingface.co/spicyneuron/Kimi-K2.6-MLX-3.3bit

This was a tricky one. At 2.9 bits, K2.6 is still ~400 GB. Perplexity barely moves, but KL divergence triples, and other evals noticeably decay.

It's entirely possible my Kimi K2.5 quants had similar decay, but my earlier workflows didn't capture it. In any case, let me know how it runs!

spicyneuron changed discussion status to open

I’m gonna check it out. Thanks.

Sign up or log in to comment