NeuroSenko commited on
Commit
4168bf0
Β·
verified Β·
1 Parent(s): a6e91ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -10,7 +10,11 @@ tags:
10
  - exl3
11
  ---
12
 
13
- Quantization was performed using [exllama3 v0.0.29](https://github.com/turboderp-org/exllamav3) (commit `cb1a436`).
 
 
 
 
14
 
15
  | Quant | Size (GB) | Actual bpw | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
16
  |---|---|---|---|---|---|---|---|
@@ -23,16 +27,14 @@ Quantization was performed using [exllama3 v0.0.29](https://github.com/turboderp
23
  | [8.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/8.0bpw) | 214.13 | 8.00 | 95.2% | 84.0% | 69.5% | 54.4% | 40.7% |
24
  | original | 214.36 | 8.00 | β€” | β€” | β€” | β€” | β€” |
25
 
26
- \* Original model produces inf/NaN in layer 61, making PPL and KL divergence non-computable.
27
-
28
  <details>
29
- <summary>Quantization Notes</summary>
30
 
31
  ### Inf/NaN values in calibration data
32
 
33
  Some experts in the model produce `inf` values during calibration (e.g. experts 61 and 74 in the last layer had inf values in their down-projection calibration state). The `lm_head` layer also exhibited NaN values in its calibration state (445K NaN out of 1.5B elements).
34
 
35
  This causes Cholesky decomposition to fail during quantization, as the Hessian matrix is no longer positive-definite. ExLlamaV3 does not handle this case gracefully β€” quantization crashes after exhausting retry attempts. A local patch was applied to fall back to uncalibrated quantization for the affected tensors. Given that only a handful of experts out of 256 in the last layer are affected, the impact on output quality is expected to be minimal.
36
-
37
- This appears to be a property of the model weights themselves, not a bug in the quantizer.
38
  </details>
 
10
  - exl3
11
  ---
12
 
13
+ Quantization was performed using [exllamav3 v0.0.29](https://github.com/turboderp-org/exllamav3) (commit `cb1a436`).
14
+
15
+ The original model is distributed in **FP8** (`float8_e4m3fn`), not FP16/BF16 β€” this is why the 8.0bpw quant is nearly identical in size to the original.
16
+
17
+ PPL and KL divergence metrics are non-computable for this model due to inf/NaN values originating from layer 61 expert weights (see Quantization Notes below). This is not specific to EXL3 β€” the same issue [affects 21-38% of GGUF quantizations](https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/) across multiple providers. Top-K agreement against the original is provided instead.
18
 
19
  | Quant | Size (GB) | Actual bpw | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
20
  |---|---|---|---|---|---|---|---|
 
27
  | [8.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/8.0bpw) | 214.13 | 8.00 | 95.2% | 84.0% | 69.5% | 54.4% | 40.7% |
28
  | original | 214.36 | 8.00 | β€” | β€” | β€” | β€” | β€” |
29
 
 
 
30
  <details>
31
+ <summary>Quantization Notes β€” inf/NaN in original model weights</summary>
32
 
33
  ### Inf/NaN values in calibration data
34
 
35
  Some experts in the model produce `inf` values during calibration (e.g. experts 61 and 74 in the last layer had inf values in their down-projection calibration state). The `lm_head` layer also exhibited NaN values in its calibration state (445K NaN out of 1.5B elements).
36
 
37
  This causes Cholesky decomposition to fail during quantization, as the Hessian matrix is no longer positive-definite. ExLlamaV3 does not handle this case gracefully β€” quantization crashes after exhausting retry attempts. A local patch was applied to fall back to uncalibrated quantization for the affected tensors. Given that only a handful of experts out of 256 in the last layer are affected, the impact on output quality is expected to be minimal.
38
+
39
+ Note that inf/NaN values are present in the **original model** during inference as well β€” both the quantized and original models produce NaN perplexity. This appears to be caused by numerically unstable expert weights that produce overflow during forward pass, not by the quantizer itself. The same layer (`blk.61.ffn_down_exps`) [has been identified](https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/) as causing NaN perplexity across GGUF quantizations by multiple providers.
40
  </details>