kalixlouiis commited on
Commit
eec8fe6
Β·
verified Β·
1 Parent(s): aecfae1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -2
README.md CHANGED
@@ -18,7 +18,7 @@ tags:
18
  - syllable-aware
19
  - datarrx
20
  ---
21
- # DatarrX / myX-Tokenizer βš”οΈ
22
 
23
  **myX-Tokenizer** is a high-performance, syllable-aware **Unigram Tokenizer** specifically engineered for the Burmese language. Developed by [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis) under [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX), this model is designed to bridge the gap in Myanmar Natural Language Processing (NLP) by providing efficient and linguistically meaningful text segmentation.
24
 
@@ -82,6 +82,28 @@ print(f"Tokens: {tokens}")
82
  - Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
83
  - Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  We are committed to advancing the Burmese NLP ecosystem. For feedback or collaboration, please use the Hugging Face Discussion tab.
86
 
87
  ---
@@ -145,4 +167,26 @@ print(f"Pieces: {sp.encode_as_pieces(text)}")
145
  - Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
146
  - Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
147
 
148
- α€€ Model နှင့် α€•α€α€Ία€žα€€α€Ία ထကြဢပြုချက်များ α€žα€­α€―α€·α€™α€Ÿα€―α€α€Ί α€™α€±α€Έα€™α€Όα€”α€Ία€Έα€œα€­α€―α€žα€Šα€Ία€™α€»α€¬α€Έα€›α€Ύα€­α€•α€«α€€ Hugging Face Discussion မှတစ်ဆင့် α€†α€€α€Ία€žα€½α€šα€Ία€”α€­α€―α€„α€Ία€•α€«α€žα€Šα€Ία‹ α€€α€»α€½α€”α€Ία€α€±α€¬α€Ία€α€­α€―α€·α€žα€Šα€Ί မြန်မာစာ NLP ဖွဢ့ဖြိုးတိုးတက်ရေးထတွက် ထမြဲမပြတ် α€€α€Όα€­α€―α€Έα€…α€¬α€Έα€”α€±α€•α€«α€žα€Šα€Ία‹
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  - syllable-aware
19
  - datarrx
20
  ---
21
+ # DatarrX / myX-Tokenizer βš”οΈ
22
 
23
  **myX-Tokenizer** is a high-performance, syllable-aware **Unigram Tokenizer** specifically engineered for the Burmese language. Developed by [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis) under [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX), this model is designed to bridge the gap in Myanmar Natural Language Processing (NLP) by providing efficient and linguistically meaningful text segmentation.
24
 
 
82
  - Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
83
  - Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
84
 
85
+ ## Citation
86
+
87
+ If you use this tokenizer in your research or project, please cite it as follows:
88
+
89
+ ### APA 7th Edition
90
+ ```APA
91
+ Khant Sint Heinn. (2026). *myX-Tokenizer: A Syllable-aware Bilingual Unigram Tokenizer for Burmese and English (Version 1.0)* [Computer software]. Hugging Face. https://huggingface.co/DatarrX/myX-Tokenizer
92
+ ```
93
+
94
+ ### BibTeX
95
+ ```BibTeX
96
+ @software{khantsintheinn2026myxtokenizer,
97
+ author = {Khant Sint Heinn},
98
+ title = {myX-Tokenizer: A Syllable-aware Bilingual Unigram Tokenizer for Burmese and English},
99
+ version = {1.0},
100
+ year = {2026},
101
+ publisher = {Hugging Face},
102
+ url = {https://huggingface.co/DatarrX/myX-Tokenizer},
103
+ note = {Developed under DatarrX (Myanmar Open Source NGO)}
104
+ }
105
+ ```
106
+
107
  We are committed to advancing the Burmese NLP ecosystem. For feedback or collaboration, please use the Hugging Face Discussion tab.
108
 
109
  ---
 
167
  - Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
168
  - Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
169
 
170
+ α€€ Model နှင့် α€•α€α€Ία€žα€€α€Ία ထကြဢပြုချက်များ α€žα€­α€―α€·α€™α€Ÿα€―α€α€Ί α€™α€±α€Έα€™α€Όα€”α€Ία€Έα€œα€­α€―α€žα€Šα€Ία€™α€»α€¬α€Έα€›α€Ύα€­α€•α€«α€€ Hugging Face Discussion မှတစ်ဆင့် α€†α€€α€Ία€žα€½α€šα€Ία€”α€­α€―α€„α€Ία€•α€«α€žα€Šα€Ία‹ α€€α€»α€½α€”α€Ία€α€±α€¬α€Ία€α€­α€―α€·α€žα€Šα€Ί မြန်မာစာ NLP ဖွဢ့ဖြိုးတိုးတက်ရေးထတွက် ထမြဲမပြတ် α€€α€Όα€­α€―α€Έα€…α€¬α€Έα€”α€±α€•α€«α€žα€Šα€Ία‹
171
+
172
+ ## Citation
173
+
174
+ α€‘α€€α€šα€Ία α€žα€„α€Ία€žα€Šα€Ί α€€ model α€€α€­α€― α€žα€„α€Ία α€žα€―α€α€±α€žα€”α€œα€―α€•α€Ία€„α€”α€Ία€Έα€™α€»α€¬α€Έα€α€½α€„α€Ί α€‘α€žα€―α€Άα€Έα€•α€Όα€―α€α€²α€·α€•α€«α€€ ထောက်ပါထတိုင်း ကိုးကားပေးရန် α€™α€±α€α€Ήα€α€¬α€›α€•α€Ία€α€Άα€‘α€•α€Ία€•α€«α€žα€Šα€Ία‹
175
+
176
+ ### APA 7th Edition
177
+ ```APA
178
+ Khant Sint Heinn. (2026). *myX-Tokenizer: A Syllable-aware Bilingual Unigram Tokenizer for Burmese and English (Version 1.0)* [Computer software]. Hugging Face. https://huggingface.co/DatarrX/myX-Tokenizer
179
+ ```
180
+
181
+ ### BibTeX
182
+ ```BibTeX
183
+ @software{khantsintheinn2026myxtokenizer,
184
+ author = {Khant Sint Heinn},
185
+ title = {myX-Tokenizer: A Syllable-aware Bilingual Unigram Tokenizer for Burmese and English},
186
+ version = {1.0},
187
+ year = {2026},
188
+ publisher = {Hugging Face},
189
+ url = {https://huggingface.co/DatarrX/myX-Tokenizer},
190
+ note = {Developed under DatarrX (Myanmar Open Source NGO)}
191
+ }
192
+ ```