Update README.md
Browse files
README.md
CHANGED
|
@@ -28,9 +28,9 @@ PPL and KL divergence metrics are non-computable for this model due to inf/NaN v
|
|
| 28 |
| original | 214.36 | 8.00 | β | β | β | β | β |
|
| 29 |
|
| 30 |
<details>
|
| 31 |
-
<summary>Quantization Notes
|
| 32 |
|
| 33 |
-
### Inf/NaN values
|
| 34 |
|
| 35 |
Some experts in the model produce `inf` values during calibration (e.g. experts 61 and 74 in the last layer had inf values in their down-projection calibration state). The `lm_head` layer also exhibited NaN values in its calibration state (445K NaN out of 1.5B elements).
|
| 36 |
|
|
|
|
| 28 |
| original | 214.36 | 8.00 | β | β | β | β | β |
|
| 29 |
|
| 30 |
<details>
|
| 31 |
+
<summary>Quantization Notes</summary>
|
| 32 |
|
| 33 |
+
### Inf/NaN values during calibration
|
| 34 |
|
| 35 |
Some experts in the model produce `inf` values during calibration (e.g. experts 61 and 74 in the last layer had inf values in their down-projection calibration state). The `lm_head` layer also exhibited NaN values in its calibration state (445K NaN out of 1.5B elements).
|
| 36 |
|