Update README.md
Browse files
README.md
CHANGED
|
@@ -115,11 +115,12 @@ The model was evaluated on standard base language model benchmarks:
|
|
| 115 |
- PIQA
|
| 116 |
- XNLI
|
| 117 |
|
| 118 |
-
|
| 119 |
-
Performance differences across TokSuite models on these benchmarks arise **solely from tokenizer choice**.
|
| 120 |
<p align="left">
|
| 121 |
<img src="./model-performance-comparison.png" alt="TokSuite Logo" width="700"/>
|
| 122 |
</p>
|
|
|
|
|
|
|
|
|
|
| 123 |
### TokSuite Robustness Benchmark
|
| 124 |
|
| 125 |
TokSuite–BLOOM is evaluated on the **TokSuite robustness benchmark**, which measures sensitivity to real-world text perturbations, including:
|
|
|
|
| 115 |
- PIQA
|
| 116 |
- XNLI
|
| 117 |
|
|
|
|
|
|
|
| 118 |
<p align="left">
|
| 119 |
<img src="./model-performance-comparison.png" alt="TokSuite Logo" width="700"/>
|
| 120 |
</p>
|
| 121 |
+
|
| 122 |
+
These evaluations verify that the model exhibits reasonable base language modeling behavior at its scale and training budget.
|
| 123 |
+
|
| 124 |
### TokSuite Robustness Benchmark
|
| 125 |
|
| 126 |
TokSuite–BLOOM is evaluated on the **TokSuite robustness benchmark**, which measures sensitivity to real-world text perturbations, including:
|