NeuroSenko
/

MiniMax-M2.7-exl3

Text Generation

Model card Files Files and versions

NeuroSenko commited on 2 days ago

Commit

9272d3d

·

verified ·

1 Parent(s): ec0e458

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ Quantization was performed using [exllamav3 v0.0.29](https://github.com/turboder
 The original model is distributed in **FP8** (`float8_e4m3fn`), not FP16/BF16 — this is why the 8.0bpw quant is nearly identical in size to the original.
-PPL and KL divergence metrics are non-computable for this model due to inf/NaN values originating from layer 61 expert weights (see Quantization Notes below). This is not specific to EXL3 — the same issue [affects 21-38% of GGUF quantizations](https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/) across multiple providers. Top-K agreement against the original is provided instead.
 | Quant | Size (GB) | Actual bpw | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
 |---|---|---|---|---|---|---|---|

 The original model is distributed in **FP8** (`float8_e4m3fn`), not FP16/BF16 — this is why the 8.0bpw quant is nearly identical in size to the original.
+PPL and KL divergence metrics are non-computable for this model due to inf/NaN values produced by layer 61 experts during forward pass. This is not specific to EXL3 — the same issue [affects 21-38% of GGUF quantizations](https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/) across multiple providers. Top-K agreement against the original is provided instead.
 | Quant | Size (GB) | Actual bpw | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
 |---|---|---|---|---|---|---|---|