File size: 983 Bytes
9b086ad
 
 
c1ce8d9
 
 
 
 
 
 
 
 
 
 
 
996025f
c1ce8d9
 
996025f
c1ce8d9
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
license: mit
---

现有的少数民族语言预训练模型仍然较为稀缺,尽管国内少数民族语言模型CINO具有较强的理解能力,但仍然缺乏面向生成与翻译领域的研究。
CMPT (Chinese Minority Pre-Trained Language Model) 是在BART的基础上,加入DeepNorm预训练的超深层生成模型。其最大具有128+128层。其在超过10G的汉英维藏蒙语料中进行受限预训练。其具有较强的理解与生成性能。

**Github Link:** https://github.com/WENGSYX/CMPT

## Usage




```python
>>> from modeling_cmpt import BartForConditionalGeneration
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained('./CMTP')
>>> model = BartForConditionalGeneration.from_pretrained('./CMTP')
>>> inputs = tokenizer.encode("Hello world, 你好 世界", return_tensors='pt')
>>> pred_ids = model.generate(input_ids, num_beams=4, max_length=20)
>>> print(tokenizer.convert_ids_to_tokens(pred_ids[i]))

```