kalixlouiis commited on
Commit
feceef1
·
verified ·
1 Parent(s): a84c824

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -1
README.md CHANGED
@@ -31,6 +31,24 @@ Trained on the [kalixlouiis/raw-data](https://huggingface.co/datasets/kalixlouii
31
  * **Limited English Support:** This model is strictly a Burmese script specialist. It has significant limitations in processing English text, which may result in excessive subword splitting for Latin characters.
32
  * **Script Sensitivity:** Optimized for modern Burmese script; performance may vary with older orthography or heavy use of specialized Pali/Sanskrit loanwords.
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ---
35
 
36
  # DatarrX - myX-Tokenizer-Unigram (မြန်မာဘာသာ)
@@ -75,4 +93,22 @@ print(sp.encode_as_pieces(text))
75
 
76
  # ✍️ Project Authors
77
  - Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
78
- - Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  * **Limited English Support:** This model is strictly a Burmese script specialist. It has significant limitations in processing English text, which may result in excessive subword splitting for Latin characters.
32
  * **Script Sensitivity:** Optimized for modern Burmese script; performance may vary with older orthography or heavy use of specialized Pali/Sanskrit loanwords.
33
 
34
+ ## Citation
35
+
36
+ If you use this tokenizer in your research or project, please cite it as follows:
37
+
38
+ ### APA 7th Edition
39
+ Khant Sint Heinn. (2026). *myX-Tokenizer-Unigram: Probabilistic Burmese Script Tokenizer (Version 1.0)* [Computer software]. Hugging Face. https://huggingface.co/DatarrX/myX-Tokenizer-Unigram
40
+
41
+ ### BibTeX
42
+ @software{khantsintheinn2026unigram,
43
+ author = {Khant Sint Heinn},
44
+ title = {myX-Tokenizer-Unigram: Probabilistic Burmese Script Tokenizer},
45
+ version = {1.0},
46
+ year = {2026},
47
+ publisher = {Hugging Face},
48
+ url = {https://huggingface.co/DatarrX/myX-Tokenizer-Unigram},
49
+ note = {Burmese-only training corpus}
50
+ }
51
+
52
  ---
53
 
54
  # DatarrX - myX-Tokenizer-Unigram (မြန်မာဘာသာ)
 
93
 
94
  # ✍️ Project Authors
95
  - Developer: [**Khant Sint Heinn (Kalix Louis)**](https://huggingface.co/kalixlouiis)
96
+ - Organization: [**DatarrX (Myanmar Open Source NGO)**](https://huggingface.co/DatarrX)
97
+
98
+ ## Citation
99
+
100
+ အကယ်၍ သင်သည် ဤ model ကို သင်၏ သုတေသနလုပ်ငန်းများတွင် အသုံးပြုခဲ့ပါက အောက်ပါအတိုင်း ကိုးကားပေးရန် မေတ္တာရပ်ခံအပ်ပါသည်။
101
+
102
+ ### APA 7th Edition
103
+ Khant Sint Heinn. (2026). *myX-Tokenizer-Unigram: Probabilistic Burmese Script Tokenizer (Version 1.0)* [Computer software]. Hugging Face. https://huggingface.co/DatarrX/myX-Tokenizer-Unigram
104
+
105
+ ### BibTeX
106
+ @software{khantsintheinn2026unigram,
107
+ author = {Khant Sint Heinn},
108
+ title = {myX-Tokenizer-Unigram: Probabilistic Burmese Script Tokenizer},
109
+ version = {1.0},
110
+ year = {2026},
111
+ publisher = {Hugging Face},
112
+ url = {https://huggingface.co/DatarrX/myX-Tokenizer-Unigram},
113
+ note = {Burmese-only training corpus}
114
+ }