Ethosoft
/

NedoTurkishTokenizer

Model card Files Files and versions

nmstech commited on Mar 18

Commit

2064cba

·

verified ·

1 Parent(s): e6da48e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ pipeline_tag: token-classification
 # NedoTurkishTokenizer
-**Turkish morphological tokenizer — TR-MMLU world record 95.45%**
 NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text using morphological rules. Unlike BPE-based tokenizers, it produces meaningful morphological units (roots and suffixes) aligned with Turkish grammar, powered by [Zemberek NLP](https://github.com/ahmetaa/zemberek-nlp).
@@ -25,7 +25,7 @@ NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text
 | **Developer** | [Ethosoft](https://huggingface.co/Ethosoft) |
 | **Language** | Turkish (`tr`) |
 | **License** | MIT |
-| **Benchmark** | TR-MMLU **95.45%** (world record) |
 | **Morphological engine** | Zemberek NLP (bundled) |
 ---

 # NedoTurkishTokenizer
+**Turkish morphological tokenizer — TR-MMLU world record 92.64%**
 NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text using morphological rules. Unlike BPE-based tokenizers, it produces meaningful morphological units (roots and suffixes) aligned with Turkish grammar, powered by [Zemberek NLP](https://github.com/ahmetaa/zemberek-nlp).
 | **Developer** | [Ethosoft](https://huggingface.co/Ethosoft) |
 | **Language** | Turkish (`tr`) |
 | **License** | MIT |
+| **Benchmark** | TR-MMLU **92.64%** (world record) |
 | **Morphological engine** | Zemberek NLP (bundled) |
 ---