toksuite
/

gpt2

Text Generation

text-generation-inference

Model card Files Files and versions

Malikeh1375 commited on Dec 18, 2025

Commit

92cd6c9

·

verified ·

1 Parent(s): ca4f565

Update README.md

Files changed (1) hide show

README.md +33 -1

README.md CHANGED Viewed

@@ -122,7 +122,39 @@ TokSuite–GPT-2 is evaluated on the **TokSuite robustness benchmark**, which me
 - OCR and spacing artifacts,
 - LaTeX and STEM-style formatting.
-Performance is reported as **relative accuracy drop** with respect to canonical inputs.
 ---

 - OCR and spacing artifacts,
 - LaTeX and STEM-style formatting.
+**Tokenization robustness under multilingual text perturbations**
+Values represent **relative performance drop**, computed as `(Acc_clean − Acc_perturbed) / Acc_clean`,  where **lower values indicate greater robustness**.
+Perturbation types include:
+- **Input:** non-native keyboard input and romanization
+- **Diacr.:** optional diacritics
+- **Orth.:** orthographic errors
+- **Morph:** morphological variations including derivations, inflections, and contractions
+- **Noise:** homoglyph substitutions, OCR artifacts, typos, and spacing errors
+- **LaTeX:** LaTeX-style mathematical formatting
+- **STEM:** scientific diagrams and notational conventions
+- **Unic.:** Unicode styling characters
+**NEN** denotes non-English inputs and **EN** denotes English inputs.  The **Avg** column reports the average relative performance drop across all perturbation categories.
+| Model         | Input | Diacr. | Orth. | Morph | Noise | LaTeX | STEM | Unic. | Avg ↓ |
+|---------------|-------|--------|-------|-------|-------|-------|------|-------|-------|
+| TokenMonster  | **0.23** | **0.33** | 0.08 | **0.01** | **-0.07** | **0.10** | 0.18 | 0.21 | **0.17** |
+| XGLM          | 0.34 | 0.49 | 0.10 | 0.11 | 0.07 | 0.12 | 0.22 | 0.29 | 0.22 |
+| BLOOM         | 0.30 | 0.34 | 0.13 | 0.07 | 0.11 | 0.18 | 0.18 | 0.24 | 0.22 |
+| ByT5          | 0.30 | 0.44 | **0.04** | 0.06 | 0.04 | 0.14 | **0.18** | 0.17 | 0.22 |
+| Comma         | 0.28 | 0.43 | 0.05 | 0.07 | **0.00** | 0.11 | 0.20 | 0.23 | 0.22 |
+| mBERT         | 0.33 | 0.44 | 0.11 | 0.11 | 0.06 | 0.18 | 0.22 | **0.14** | 0.24 |
+| GPT-4o        | 0.30 | 0.51 | 0.08 | 0.05 | 0.05 | 0.16 | 0.19 | 0.24 | 0.24 |
+| GPT-2         | 0.34 | 0.46 | 0.07 | 0.10 | 0.06 | 0.14 | 0.21 | 0.24 | 0.25 |
+| Phi-3         | 0.33 | 0.46 | 0.16 | 0.09 | 0.08 | 0.17 | 0.21 | 0.24 | 0.25 |
+| Gemma-2       | 0.32 | 0.42 | 0.14 | **0.15** | 0.03 | 0.16 | 0.25 | 0.22 | 0.26 |
+| Qwen-3        | **0.36** | 0.42 | 0.14 | 0.11 | 0.06 | 0.16 | 0.23 | 0.26 | 0.26 |
+| Llama-3.2     | 0.33 | **0.55** | 0.11 | 0.10 | 0.08 | 0.15 | 0.24 | 0.17 | 0.26 |
+| Aya           | 0.31 | 0.46 | 0.14 | 0.10 | 0.03 | **0.19** | **0.25** | 0.21 | 0.26 |
+| Tekken        | 0.33 | 0.47 | **0.18** | 0.03 | **0.31** | 0.10 | 0.21 | **0.27** | **0.27** |
 ---