Instructions to use aliarda/turkish_tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aliarda/turkish_tokenizer with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("aliarda/turkish_tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ tags:
|
|
| 11 |
|
| 12 |
# Model Card for Turkish Byte Pair Encoding Tokenizer
|
| 13 |
|
| 14 |
-
This model provides a tokenizer specifically designed for the Turkish language. It includes nearly
|
| 15 |
|
| 16 |
## Model Details
|
| 17 |
|
|
|
|
| 11 |
|
| 12 |
# Model Card for Turkish Byte Pair Encoding Tokenizer
|
| 13 |
|
| 14 |
+
This model provides a tokenizer specifically designed for the Turkish language. It includes nearly 25,000 Turkish word roots, all Turkish suffixes in both lowercase and uppercase forms, and extends with approximately 14,000 additional tokens using Byte Pair Encoding (BPE). The tokenizer is intended to improve the tokenization quality for NLP tasks involving Turkish text.
|
| 15 |
|
| 16 |
## Model Details
|
| 17 |
|