@s3nh on Hugging Face: "Existing methods — GPTQ, AWQ, llama.cpp's k-quants

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update 27 days ago

Post

238

Existing methods — GPTQ, AWQ, llama.cpp's k-quants — minimize empirical loss heuristically. None of them prove they are optimal in any information-theoretic sense. ICRB-Q builds a quantization scheme that is provably optimal via the Cramér-Rao lower bound (CRB): no unbiased estimator of a weight can have lower variance than [F(θ)]⁻¹, where F is the Fisher information matrix.

s3nh

27 days ago

Standard quantization places levels on a uniform grid. ICRB-Q places them on geodesics of the Fisher-Rao statistical manifold — the Riemannian manifold (M, g_F) where the metric tensor is the Fisher information. This means:

High-Fisher-curvature regions (where small weight changes cause large output changes) get exponentially denser levels.
Low-curvature, "flat" regions (e.g. many heads in early transformer layers) get coarse 2-bit or 3-bit quantization automatically.
The codebook construction reduces to solving: place 2^b points in parameter space to minimize expected geodesic distance from any weight to its nearest level.

This strictly generalizes AWQ's per-channel scaling (which is a zero-order approximation to this manifold geometry) and GPTQ's second-order correction (which is a local linearization).

In this post

s3nh s3nh