File size: 431 Bytes
21c5219 8d69de2 8838402 8d69de2 d4c0e45 8d69de2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
---
datasets:
- saillab/alpaca-mongolian-cleaned
language:
- mn
---
## Usage
```python
from transformers import AutoTokenizer
text = "сайн уу"
mn_tokenizer = AutoTokenizer.from_pretrained("goryden/mn_tokenizer")
tokens = mn_tokenizer.tokenize(text)
encoded = mn_tokenizer.encode(text)
decoded = mn_tokenizer.decode(encoded)
print("Original:", text)
print("Mongolian tokenizer tokens:", tokens)
print("Decoded :", decoded)
|