Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ tags:
|
|
| 18 |
- syllable-aware
|
| 19 |
- datarrx
|
| 20 |
---
|
| 21 |
-
# DatarrX / myX-Tokenizer βοΈ
|
| 22 |
|
| 23 |
**myX-Tokenizer** is a high-performance, syllable-aware **Unigram Tokenizer** specifically engineered for the Burmese language. Developed by [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis) under [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX), this model is designed to bridge the gap in Myanmar Natural Language Processing (NLP) by providing efficient and linguistically meaningful text segmentation.
|
| 24 |
|
|
@@ -82,6 +82,28 @@ print(f"Tokens: {tokens}")
|
|
| 82 |
- Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
|
| 83 |
- Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
|
| 84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
We are committed to advancing the Burmese NLP ecosystem. For feedback or collaboration, please use the Hugging Face Discussion tab.
|
| 86 |
|
| 87 |
---
|
|
@@ -145,4 +167,26 @@ print(f"Pieces: {sp.encode_as_pieces(text)}")
|
|
| 145 |
- Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
|
| 146 |
- Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
|
| 147 |
|
| 148 |
-
α€ Model ααΎαα·αΊ αααΊαααΊα α‘ααΌαΆααΌα―αα»ααΊαα»α¬αΈ ααα―α·ααα―ααΊ αα±αΈααΌααΊαΈααα―αααΊαα»α¬αΈααΎααα«α Hugging Face Discussion ααΎαα
αΊααα·αΊ αααΊαα½ααΊααα―ααΊαα«αααΊα αα»α½ααΊαα±α¬αΊααα―α·αααΊ ααΌααΊαα¬α
α¬ NLP αα½αΆα·ααΌαα―αΈααα―αΈαααΊαα±αΈα‘αα½ααΊ α‘ααΌα²αααΌααΊ ααΌαα―αΈα
α¬αΈαα±αα«αααΊα
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
- syllable-aware
|
| 19 |
- datarrx
|
| 20 |
---
|
| 21 |
+
# DatarrX / myX-Tokenizer βοΈ
|
| 22 |
|
| 23 |
**myX-Tokenizer** is a high-performance, syllable-aware **Unigram Tokenizer** specifically engineered for the Burmese language. Developed by [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis) under [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX), this model is designed to bridge the gap in Myanmar Natural Language Processing (NLP) by providing efficient and linguistically meaningful text segmentation.
|
| 24 |
|
|
|
|
| 82 |
- Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
|
| 83 |
- Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
|
| 84 |
|
| 85 |
+
## Citation
|
| 86 |
+
|
| 87 |
+
If you use this tokenizer in your research or project, please cite it as follows:
|
| 88 |
+
|
| 89 |
+
### APA 7th Edition
|
| 90 |
+
```APA
|
| 91 |
+
Khant Sint Heinn. (2026). *myX-Tokenizer: A Syllable-aware Bilingual Unigram Tokenizer for Burmese and English (Version 1.0)* [Computer software]. Hugging Face. https://huggingface.co/DatarrX/myX-Tokenizer
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
### BibTeX
|
| 95 |
+
```BibTeX
|
| 96 |
+
@software{khantsintheinn2026myxtokenizer,
|
| 97 |
+
author = {Khant Sint Heinn},
|
| 98 |
+
title = {myX-Tokenizer: A Syllable-aware Bilingual Unigram Tokenizer for Burmese and English},
|
| 99 |
+
version = {1.0},
|
| 100 |
+
year = {2026},
|
| 101 |
+
publisher = {Hugging Face},
|
| 102 |
+
url = {https://huggingface.co/DatarrX/myX-Tokenizer},
|
| 103 |
+
note = {Developed under DatarrX (Myanmar Open Source NGO)}
|
| 104 |
+
}
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
We are committed to advancing the Burmese NLP ecosystem. For feedback or collaboration, please use the Hugging Face Discussion tab.
|
| 108 |
|
| 109 |
---
|
|
|
|
| 167 |
- Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
|
| 168 |
- Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
|
| 169 |
|
| 170 |
+
α€ Model ααΎαα·αΊ αααΊαααΊα α‘ααΌαΆααΌα―αα»ααΊαα»α¬αΈ ααα―α·ααα―ααΊ αα±αΈααΌααΊαΈααα―αααΊαα»α¬αΈααΎααα«α Hugging Face Discussion ααΎαα
αΊααα·αΊ αααΊαα½ααΊααα―ααΊαα«αααΊα αα»α½ααΊαα±α¬αΊααα―α·αααΊ ααΌααΊαα¬α
α¬ NLP αα½αΆα·ααΌαα―αΈααα―αΈαααΊαα±αΈα‘αα½ααΊ α‘ααΌα²αααΌααΊ ααΌαα―αΈα
α¬αΈαα±αα«αααΊα
|
| 171 |
+
|
| 172 |
+
## Citation
|
| 173 |
+
|
| 174 |
+
α‘αααΊα αααΊαααΊ α€ model ααα― αααΊα αα―αα±αααα―ααΊαααΊαΈαα»α¬αΈαα½ααΊ α‘αα―αΆαΈααΌα―αα²α·αα«α α‘α±α¬ααΊαα«α‘ααα―ααΊαΈ ααα―αΈαα¬αΈαα±αΈαααΊ αα±ααΉαα¬αααΊααΆα‘ααΊαα«αααΊα
|
| 175 |
+
|
| 176 |
+
### APA 7th Edition
|
| 177 |
+
```APA
|
| 178 |
+
Khant Sint Heinn. (2026). *myX-Tokenizer: A Syllable-aware Bilingual Unigram Tokenizer for Burmese and English (Version 1.0)* [Computer software]. Hugging Face. https://huggingface.co/DatarrX/myX-Tokenizer
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
### BibTeX
|
| 182 |
+
```BibTeX
|
| 183 |
+
@software{khantsintheinn2026myxtokenizer,
|
| 184 |
+
author = {Khant Sint Heinn},
|
| 185 |
+
title = {myX-Tokenizer: A Syllable-aware Bilingual Unigram Tokenizer for Burmese and English},
|
| 186 |
+
version = {1.0},
|
| 187 |
+
year = {2026},
|
| 188 |
+
publisher = {Hugging Face},
|
| 189 |
+
url = {https://huggingface.co/DatarrX/myX-Tokenizer},
|
| 190 |
+
note = {Developed under DatarrX (Myanmar Open Source NGO)}
|
| 191 |
+
}
|
| 192 |
+
```
|