File size: 431 Bytes
21c5219
 
 
 
 
 
8d69de2
 
8838402
8d69de2
 
d4c0e45
8d69de2
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
datasets:
- saillab/alpaca-mongolian-cleaned
language:
- mn
---
## Usage
```python
from transformers import AutoTokenizer
text = "сайн уу"

mn_tokenizer = AutoTokenizer.from_pretrained("goryden/mn_tokenizer")

tokens = mn_tokenizer.tokenize(text)
encoded = mn_tokenizer.encode(text)
decoded = mn_tokenizer.decode(encoded)

print("Original:", text)
print("Mongolian tokenizer tokens:", tokens)
print("Decoded :", decoded)