Quantization was performed using exllamav3 v0.0.28 (commit ea87af6).

Quant Size (GB) Actual bpw PPL KL-div (q→o) KL-div (o→q) Top-1 Top-2 Top-3 Top-4 Top-5
4.0bpw 3.94 4.00 29.100 0.0150 0.0150 93.1% 80.3% 64.8% 49.4% 35.8%
5.0bpw 4.51 5.00 28.854 0.0042 0.0042 96.2% 88.6% 78.6% 67.2% 55.8%
6.0bpw 4.92 6.00 28.666 0.0013 0.0013 97.9% 93.7% 87.6% 80.1% 71.8%
7.0bpw 5.34 7.00 28.610 0.0004 0.0004 98.7% 96.0% 92.2% 87.2% 81.4%
8.0bpw 5.75 8.00 28.621 0.0002 0.0002 99.1% 97.2% 94.4% 90.8% 86.4%
original 9.66 16.00 28.596

Metrics

  • PPL (Perplexity) — how well the model predicts the next token. Lower is better. The original model's PPL is the baseline.
  • KL-div (Kullback-Leibler divergence) — measures how the quant's probability distribution differs from the original. Lower is better. Shown in both directions (quant→orig, orig→quant); asymmetry indicates where the quant over/under-estimates probabilities.
  • Top-K agreement — probability that the quant's top-K predicted tokens match the original's top-K. Higher is better. Top-1 is the most important (does the quant pick the same best token?), higher K values show agreement across less likely candidates.

Example

4.0bpw performance example using gradio script from origin's repo.

image

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuroSenko/ToriiGate-0.5-exl3

Quantized
(4)
this model