SMQ Kazakh Tokenizer

SentencePiece Unigram tokenizer trained on Kazakh language data.

Usage

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("salyamq/smq-tokenizer-kz")
print(tokenizer.tokenize("Сәлеметсіз бе!"))

Details

  • Algorithm: Unigram
  • Language: Kazakh (kk)
  • Vocab size: 32000

Author

salyamq

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support