Publish KL-Divergence for Every Quant Level, for all your models

#10

by BitBuilder - opened 12 days ago

12 days ago

Hi Unsloth team — you’re setting standards with your quantized model releases. One of the biggest unresolved challenges in choosing a quantized LLM is quantifying quality loss. Right now users mostly rely on Reddit tests, informal discussions, or benchmark noise to guess how much “intelligence” they’re trading for speed.

A systematic KL-divergence report for each quant level (vs the unquantized base) would directly measure how much the token probability distribution shifts with quantization — a far more informative metric than perplexity or task-score alone. KL divergence is widely used in recent research to detect subtle distribution shifts that accuracy and perplexity hide, especially flips in behavior even when accuracy is similar.

Publishing KL-divergence per quant level would:

Align with SOTA evaluation practices in quantization research (https://arxiv.org/abs/2407.09141) as well as your own blog post (https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs#why-kl-divergence)
Enable scientific speed ↔ quality trade-offs instead of guesswork.

This would be a technical differentiator for Unsloth and a step toward an industry norm.

Thanks,
Sid

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment