aliarda
/

turkish_tokenizer

byte-pair-encoding

Model card Files Files and versions

aliarda commited on Dec 3, 2024

Commit

52b61ba

·

verified ·

1 Parent(s): d679b6f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 # Model Card for Turkish Byte Pair Encoding Tokenizer
-This model provides a tokenizer specifically designed for the Turkish language. It includes nearly 12,000 Turkish word roots, all Turkish suffixes in both lowercase and uppercase forms, and extends with approximately 40,000 additional tokens using Byte Pair Encoding (BPE). The tokenizer is intended to improve the tokenization quality for NLP tasks involving Turkish text.
 ## Model Details

 # Model Card for Turkish Byte Pair Encoding Tokenizer
+This model provides a tokenizer specifically designed for the Turkish language. It includes nearly 25,000 Turkish word roots, all Turkish suffixes in both lowercase and uppercase forms, and extends with approximately 14,000 additional tokens using Byte Pair Encoding (BPE). The tokenizer is intended to improve the tokenization quality for NLP tasks involving Turkish text.
 ## Model Details