YixuanWeng commited on
Commit
c1ce8d9
·
1 Parent(s): 9b086ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -1,3 +1,24 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ 现有的少数民族语言预训练模型仍然较为稀缺,尽管国内少数民族语言模型CINO具有较强的理解能力,但仍然缺乏面向生成与翻译领域的研究。
6
+ CMPT (Chinese Minority Pre-Trained Language Model) 是在BART的基础上,加入DeepNorm预训练的超深层生成模型。其最大具有128+128层。其在超过10G的汉英维藏蒙语料中进行受限预训练。其具有较强的理解与生成性能。
7
+
8
+ **Github Link:** https://github.com/WENGSYX/CMPT
9
+
10
+ ## Usage
11
+
12
+
13
+
14
+
15
+ ```python
16
+ >>> from modeling_cmpt import CMPTForCir
17
+ >>> from transformers import AutoTokenizer
18
+ >>> tokenizer = AutoTokenizer.from_pretrained('./CMTP')
19
+ >>> model = CMPTForCir.from_pretrained('./CMTP')
20
+ >>> inputs = tokenizer.encode("Hello world, 你好 世界", return_tensors='pt')
21
+ >>> pred_ids = model.generate(input_ids, num_beams=4, max_length=20)
22
+ >>> print(tokenizer.convert_ids_to_tokens(pred_ids[i]))
23
+
24
+ ```