NeuroSenko commited on
Commit
9272d3d
·
verified ·
1 Parent(s): ec0e458

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ Quantization was performed using [exllamav3 v0.0.29](https://github.com/turboder
14
 
15
  The original model is distributed in **FP8** (`float8_e4m3fn`), not FP16/BF16 — this is why the 8.0bpw quant is nearly identical in size to the original.
16
 
17
- PPL and KL divergence metrics are non-computable for this model due to inf/NaN values originating from layer 61 expert weights (see Quantization Notes below). This is not specific to EXL3 — the same issue [affects 21-38% of GGUF quantizations](https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/) across multiple providers. Top-K agreement against the original is provided instead.
18
 
19
  | Quant | Size (GB) | Actual bpw | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
20
  |---|---|---|---|---|---|---|---|
 
14
 
15
  The original model is distributed in **FP8** (`float8_e4m3fn`), not FP16/BF16 — this is why the 8.0bpw quant is nearly identical in size to the original.
16
 
17
+ PPL and KL divergence metrics are non-computable for this model due to inf/NaN values produced by layer 61 experts during forward pass. This is not specific to EXL3 — the same issue [affects 21-38% of GGUF quantizations](https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/) across multiple providers. Top-K agreement against the original is provided instead.
18
 
19
  | Quant | Size (GB) | Actual bpw | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
20
  |---|---|---|---|---|---|---|---|