Quantization was performed using exllamav3 v0.0.28 (commit ea87af6).

Quant	Size (GB)	Actual bpw	PPL	KL-div (q→o)	KL-div (o→q)	Top-1	Top-2	Top-3	Top-4	Top-5
4.0bpw	3.94	4.00	29.100	0.0150	0.0150	93.1%	80.3%	64.8%	49.4%	35.8%
5.0bpw	4.51	5.00	28.854	0.0042	0.0042	96.2%	88.6%	78.6%	67.2%	55.8%
6.0bpw	4.92	6.00	28.666	0.0013	0.0013	97.9%	93.7%	87.6%	80.1%	71.8%
7.0bpw	5.34	7.00	28.610	0.0004	0.0004	98.7%	96.0%	92.2%	87.2%	81.4%
8.0bpw	5.75	8.00	28.621	0.0002	0.0002	99.1%	97.2%	94.4%	90.8%	86.4%
original	9.66	16.00	28.596	—	—	—	—	—	—	—

PPL (Perplexity) — how well the model predicts the next token. Lower is better. The original model's PPL is the baseline.
KL-div (Kullback-Leibler divergence) — measures how the quant's probability distribution differs from the original. Lower is better. Shown in both directions (quant→orig, orig→quant); asymmetry indicates where the quant over/under-estimates probabilities.
Top-K agreement — probability that the quant's top-K predicted tokens match the original's top-K. Higher is better. Top-1 is the most important (does the quant pick the same best token?), higher K values show agreement across less likely candidates.

4.0bpw performance example using gradio script from origin's repo.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuroSenko/ToriiGate-0.5-exl3

Base model

Finetuned

Quantized

(4)

this model