Update comparison table: add BF16 variant, set BF16 as baseline
Browse files
README.md
CHANGED
|
@@ -175,7 +175,7 @@ The table below compares locally validated TensorRT-LLM variants built for the s
|
|
| 175 |
|
| 176 |
| Variant | Checkpoint | Engine | `short_chat_42_64` | `balanced_128_128` | `long_generation_42_192` | Quick-check overall | Quick-check change vs BF16 | Practical reading |
|
| 177 |
|---|---:|---:|---:|---:|---:|---:|---|---|
|
| 178 |
-
| `BF16` | `3.4 GB` | `3.4 GB` | `105.27 tok/s` | `105.37 tok/s` | `105.57 tok/s` | `0.725` | `baseline` |
|
| 179 |
| `FP16` | `3.4 GB` | `3.4 GB` | `105.48 tok/s` | `105.49 tok/s` | `105.70 tok/s` | `0.75` | `+2.5 pts on this quick-check` | Most conservative variant |
|
| 180 |
| `FP8` | `2.1 GB` | `2.2 GB` | `166.72 tok/s` | `144.37 tok/s` | `151.36 tok/s` | `0.75` | `+2.5 pts on this quick-check` | Best balance in these local tests |
|
| 181 |
| `NVFP4` | `1.6 GB` | `1.2 GB` | `199.58 tok/s` | `200.09 tok/s` | `200.28 tok/s` | `0.60` | `-12.5 pts on this quick-check` | Fastest and smallest, but with visible quality drop |
|
|
|
|
| 175 |
|
| 176 |
| Variant | Checkpoint | Engine | `short_chat_42_64` | `balanced_128_128` | `long_generation_42_192` | Quick-check overall | Quick-check change vs BF16 | Practical reading |
|
| 177 |
|---|---:|---:|---:|---:|---:|---:|---|---|
|
| 178 |
+
| `BF16` | `3.4 GB` | `3.4 GB` | `105.27 tok/s` | `105.37 tok/s` | `105.57 tok/s` | `0.725` | `baseline` | Native precision, best numerical stability |
|
| 179 |
| `FP16` | `3.4 GB` | `3.4 GB` | `105.48 tok/s` | `105.49 tok/s` | `105.70 tok/s` | `0.75` | `+2.5 pts on this quick-check` | Most conservative variant |
|
| 180 |
| `FP8` | `2.1 GB` | `2.2 GB` | `166.72 tok/s` | `144.37 tok/s` | `151.36 tok/s` | `0.75` | `+2.5 pts on this quick-check` | Best balance in these local tests |
|
| 181 |
| `NVFP4` | `1.6 GB` | `1.2 GB` | `199.58 tok/s` | `200.09 tok/s` | `200.28 tok/s` | `0.60` | `-12.5 pts on this quick-check` | Fastest and smallest, but with visible quality drop |
|