Update README.md
Browse files
README.md
CHANGED
|
@@ -53,13 +53,13 @@ The table above provides a high-level summary of observed accuracy deltas across
|
|
| 53 |
|
| 54 |
### Generation Stability (EOS Behavior)
|
| 55 |
|
| 56 |
-
| Model |
|
| 57 |
|------|--------------------|
|
| 58 |
| Torch baseline | 6% |
|
| 59 |
| Previous INT4 GPU ONNX model | 52% |
|
| 60 |
| Updated QAT INT4 GPU ONNX model | **11%** |
|
| 61 |
|
| 62 |
-
The updated model reduces
|
| 63 |
|
| 64 |
## Hardware Supported
|
| 65 |
The ONNX models are tested on:
|
|
|
|
| 53 |
|
| 54 |
### Generation Stability (EOS Behavior)
|
| 55 |
|
| 56 |
+
| Model | EOS Non-Emission Rate |
|
| 57 |
|------|--------------------|
|
| 58 |
| Torch baseline | 6% |
|
| 59 |
| Previous INT4 GPU ONNX model | 52% |
|
| 60 |
| Updated QAT INT4 GPU ONNX model | **11%** |
|
| 61 |
|
| 62 |
+
The updated model reduces EOS non-emission by approximately 5× compared to the previous INT4 GPU ONNX release, as observed across a large set of randomly generated prompts, resulting in more reliable sequence termination and generation behavior closer to the Torch baseline.
|
| 63 |
|
| 64 |
## Hardware Supported
|
| 65 |
The ONNX models are tested on:
|