Update README.md
Browse files
README.md
CHANGED
|
@@ -86,10 +86,10 @@ Comparison with state-of-the-art tokenizers on 60,000 samples (30k Arabic + 30k
|
|
| 86 |
|
| 87 |
| Tokenizer | Vocab | AR Fert | EN Fert | Avg Fert | AR C/T | EN C/T | Parity |
|
| 88 |
|-----------|-------|---------|---------|----------|--------|--------|--------|
|
| 89 |
-
| **SARFTokenizer** | 64,641 | **1.72** | 1.57 | **1.64** | 3.45 | 2.99 | 1.156 |
|
| 90 |
| ALLaM-7B | 64,000 | 1.82 | 1.48 | 1.65 | 3.08 | 2.65 | 1.163 |
|
| 91 |
| Gemma-3-4B | 262,145 | 2.78 | 1.33 | 2.05 | 2.42 | 3.00 | 0.805 |
|
| 92 |
-
| Falcon-H1-7B | 130,049 | 2.65 | 1.55 | 2.10 | 2.55 | 2.75 |
|
| 93 |
| Fanar-1-9B | 128,256 | 2.85 | 1.36 | 2.11 | 2.27 | 2.93 | 0.775 |
|
| 94 |
| Hala-9B | 128,256 | 2.85 | 1.36 | 2.11 | 2.27 | 2.93 | 0.775 |
|
| 95 |
| GPT-4o | 200,019 | 2.81 | 1.44 | 2.12 | 2.45 | 3.37 | 0.726 |
|
|
|
|
| 86 |
|
| 87 |
| Tokenizer | Vocab | AR Fert | EN Fert | Avg Fert | AR C/T | EN C/T | Parity |
|
| 88 |
|-----------|-------|---------|---------|----------|--------|--------|--------|
|
| 89 |
+
| **SARFTokenizer** | 64,641 | **1.72** | 1.57 | **1.64** | 3.45 | 2.99 | **1.156** |
|
| 90 |
| ALLaM-7B | 64,000 | 1.82 | 1.48 | 1.65 | 3.08 | 2.65 | 1.163 |
|
| 91 |
| Gemma-3-4B | 262,145 | 2.78 | 1.33 | 2.05 | 2.42 | 3.00 | 0.805 |
|
| 92 |
+
| Falcon-H1-7B | 130,049 | 2.65 | 1.55 | 2.10 | 2.55 | 2.75 | 0.926 |
|
| 93 |
| Fanar-1-9B | 128,256 | 2.85 | 1.36 | 2.11 | 2.27 | 2.93 | 0.775 |
|
| 94 |
| Hala-9B | 128,256 | 2.85 | 1.36 | 2.11 | 2.27 | 2.93 | 0.775 |
|
| 95 |
| GPT-4o | 200,019 | 2.81 | 1.44 | 2.12 | 2.45 | 3.37 | 0.726 |
|