Commit
·
5c8fbe1
1
Parent(s):
9373373
Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,40 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- zh
|
| 4 |
+
- bo
|
| 5 |
+
- kk
|
| 6 |
+
- ko
|
| 7 |
+
- mn
|
| 8 |
+
- ug
|
| 9 |
+
- yue
|
| 10 |
+
license: "apache-2.0"
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## CINO: Pre-trained Language Models for Chinese Minority Languages(中国少数民族预训练模型)
|
| 14 |
+
|
| 15 |
+
Multilingual Pre-trained Language Model, such as mBERT, XLM-R, provide multilingual and cross-lingual ability for language understanding.
|
| 16 |
+
We have seen rapid progress on building multilingual PLMs in recent year.
|
| 17 |
+
However, there is a lack of contributions on building PLMs on Chines minority languages, which hinders researchers from building powerful NLP systems.
|
| 18 |
+
|
| 19 |
+
To address the absence of Chinese minority PLMs, Joint Laboratory of HIT and iFLYTEK Research (HFL) proposes CINO (Chinese-miNOrity pre-trained language model), which is built on XLM-R with additional pre-training using Chinese minority corpus, such as
|
| 20 |
+
- Chinese,中文(zh)
|
| 21 |
+
- Tibetan,藏语(bo)
|
| 22 |
+
- Mongolian (Uighur form),蒙语(mn)
|
| 23 |
+
- Uyghur,维吾尔语(ug)
|
| 24 |
+
- Kazakh (Arabic form),哈萨克语(kk)
|
| 25 |
+
- Korean,朝鲜语(ko)
|
| 26 |
+
- Zhuang,壮语
|
| 27 |
+
- Cantonese,粤语(yue)
|
| 28 |
+
|
| 29 |
+
Please read our GitHub repository for more details (Chinese): https://github.com/ymcui/Chinese-Minority-PLM
|
| 30 |
+
|
| 31 |
+
You may also interested in,
|
| 32 |
+
|
| 33 |
+
Chinese MacBERT: https://github.com/ymcui/MacBERT
|
| 34 |
+
Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm
|
| 35 |
+
Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA
|
| 36 |
+
Chinese XLNet: https://github.com/ymcui/Chinese-XLNet
|
| 37 |
+
Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer
|
| 38 |
+
|
| 39 |
+
More resources by HFL: https://github.com/ymcui/HFL-Anthology
|
| 40 |
+
|