Publish KL-Divergence for Every Quant Level, for all your models
Hi Unsloth team — you’re setting standards with your quantized model releases. One of the biggest unresolved challenges in choosing a quantized LLM is quantifying quality loss. Right now users mostly rely on Reddit tests, informal discussions, or benchmark noise to guess how much “intelligence” they’re trading for speed.
A systematic KL-divergence report for each quant level (vs the unquantized base) would directly measure how much the token probability distribution shifts with quantization — a far more informative metric than perplexity or task-score alone. KL divergence is widely used in recent research to detect subtle distribution shifts that accuracy and perplexity hide, especially flips in behavior even when accuracy is similar.
Publishing KL-divergence per quant level would:
- Align with SOTA evaluation practices in quantization research (https://arxiv.org/abs/2407.09141) as well as your own blog post (https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs#why-kl-divergence)
- Enable scientific speed ↔ quality trade-offs instead of guesswork.
This would be a technical differentiator for Unsloth and a step toward an industry norm.
Thanks,
Sid