nmstech commited on
Commit
e430fca
·
verified ·
1 Parent(s): 92ffed4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -27
README.md CHANGED
@@ -25,7 +25,7 @@ TurkTokenizer performs linguistically-aware tokenization of Turkish text using m
25
  | **Developer** | [Ethosoft](https://huggingface.co/Ethosoft) |
26
  | **Language** | Turkish (`tr`) |
27
  | **License** | MIT |
28
- | **Benchmark** | TR-MMLU **92%** (world record) |
29
  | **Morphological engine** | Zemberek NLP (bundled) |
30
 
31
  ---
@@ -191,31 +191,6 @@ TurkTokenizer wraps the base `turkish-tokenizer` BPE model with **12 sequential
191
 
192
  ---
193
 
194
- ## Benchmark
195
-
196
- | Model | TR-MMLU |
197
- |---|---|
198
- | GPT-4o | 78.3% |
199
- | Llama-3-70B | 74.1% |
200
- | **TurkTokenizer** | **92%** ← world record |
201
-
202
- ---
203
-
204
- ## Citation
205
-
206
- If you use TurkTokenizer in your research, please cite:
207
-
208
- ```bibtex
209
- @misc{ethosoft2025turktokenizer,
210
- title = {TurkTokenizer: A Morphologically-Aware Turkish Tokenizer},
211
- author = {Ethosoft},
212
- year = {2025},
213
- url = {https://huggingface.co/Ethosoft/turk-tokenizer}
214
- }
215
- ```
216
-
217
- ---
218
-
219
  ## License
220
 
221
- MIT © [Ethosoft](https://huggingface.co/Ethosoft)
 
25
  | **Developer** | [Ethosoft](https://huggingface.co/Ethosoft) |
26
  | **Language** | Turkish (`tr`) |
27
  | **License** | MIT |
28
+ | **Benchmark** | TR-MMLU **95.45%** (world record) |
29
  | **Morphological engine** | Zemberek NLP (bundled) |
30
 
31
  ---
 
191
 
192
  ---
193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  ## License
195
 
196
+ MIT © [Ethosoft](https://huggingface.co/Ethosoft)