Translation
Transformers
Safetensors
qwen3
text-generation
text-generation-inference
luoyingfeng commited on
Commit
ea20f44
·
verified ·
1 Parent(s): d19c4ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -48,7 +48,7 @@ language:
48
  - km
49
  - ky
50
  - lo
51
- - mn
52
  - mr
53
  - ms
54
  - my
@@ -73,11 +73,11 @@ library_name: transformers
73
  ---
74
 
75
  ## LMT
76
- - Paper: [Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs](https://arxiv.org/abs/2511.07003)
77
  - Github: [LMT](https://github.com/NiuTrans/LMT)
78
 
79
- **LMT-60** is a suite of **Chinese-English-centric** MMT models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
80
- We release both the CPT and SFT versions of LMT-60 in four sizes (0.6B/1.7B/4B/8B). All checkpoints are available:
81
  | Models | Model Link |
82
  |:------------|:------------|
83
  | LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
@@ -101,7 +101,7 @@ model_name = "NiuTrans/LMT-60-8B"
101
  tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
102
  model = AutoModelForCausalLM.from_pretrained(model_name)
103
 
104
- prompt = """Translate the following text from English into Chinese.
105
  English: The concept came from China where plum blossoms were the flower of choice.
106
  Chinese:"""
107
  messages = [{"role": "user", "content": prompt}]
@@ -125,15 +125,15 @@ print("response:", outputs)
125
  | Resource Tier | Languages |
126
  | :---- | :---- |
127
  | High-resource Languages (13) | Arabic(ar), English(en), Spanish(es), German(de), French(fr), Italian(it), Japanese(ja), Dutch(nl), Polish(pl), Portuguese(pt), Russian(ru), Turkish(tr), Chinese(zh) |
128
- | Medium-resource Languages (18) | Bulgarian(bg), Bengali(bn), Czech(cs), Danish(da), Modern Greek(el), Persian(fa), Finnish(fi), Hindi(hi), Hungarian(hu), Indonesian(id), Korean(ko), Norwegian(nb), Romanian(ro), Slovak(sk), Swedish(sv), Thai(th), Ukrainian(uk), Vietnamese(vi) |
129
- | Low-resouce Languages (29) | Amharic(am), Azerbaijani(az), Tibetan(bo), Modern Hebrew(he), Croatian(hr), Armenian(hy), Icelandic(is), Javanese(jv), Georgian(ka), Kazakh(kk), Central Khmer(km), Kirghiz(ky), Lao(lo), Chinese Mongolian(mn_cn), Marathi(mr), Malay(ms), Burmese(my), Nepali(ne), Pashto(ps), Sinhala(si), Swahili(sw), Tamil(ta), Telugu(te), Tajik(tg), Tagalog(tl), Uighur(ug), Urdu(ur), Uzbek(uz), Yue Chinese(yue) |
130
 
131
  ## Citation
132
 
133
  If you find our paper useful for your research, please kindly cite our paper:
134
  ```bash
135
  @misc{luoyf2025lmt,
136
- title={Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
137
  author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
138
  year={2025},
139
  eprint={2511.07003},
 
48
  - km
49
  - ky
50
  - lo
51
+ - mvf
52
  - mr
53
  - ms
54
  - my
 
73
  ---
74
 
75
  ## LMT
76
+ - Paper: [NiuTrans.LMT: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs](https://arxiv.org/abs/2511.07003)
77
  - Github: [LMT](https://github.com/NiuTrans/LMT)
78
 
79
+ **LMT-60** is a suite of **Chinese-English-centric** Multilingual Machine Translation (MMT) models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
80
+ We release both the CPT and GRPO versions of LMT-60 in four sizes (0.6B/1.7B/4B/8B). All checkpoints are available:
81
  | Models | Model Link |
82
  |:------------|:------------|
83
  | LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
 
101
  tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
102
  model = AutoModelForCausalLM.from_pretrained(model_name)
103
 
104
+ prompt = """Translate the following text from English into Chinese:
105
  English: The concept came from China where plum blossoms were the flower of choice.
106
  Chinese:"""
107
  messages = [{"role": "user", "content": prompt}]
 
125
  | Resource Tier | Languages |
126
  | :---- | :---- |
127
  | High-resource Languages (13) | Arabic(ar), English(en), Spanish(es), German(de), French(fr), Italian(it), Japanese(ja), Dutch(nl), Polish(pl), Portuguese(pt), Russian(ru), Turkish(tr), Chinese(zh) |
128
+ | Medium-resource Languages (18) | Bulgarian(bg), Bengali(bn), Czech(cs), Danish(da), Modern Greek(el), Persian(fa), Finnish(fi), Hindi(hi), Hungarian(hu), Indonesian(id), Korean(ko), Norwegian Bokmål(nb), Romanian(ro), Slovak(sk), Swedish(sv), Thai(th), Ukrainian(uk), Vietnamese(vi) |
129
+ | Low-resouce Languages (29) | Amharic(am), Azerbaijani(az), Tibetan(bo), Modern Hebrew(he), Croatian(hr), Armenian(hy), Icelandic(is), Javanese(jv), Georgian(ka), Kazakh(kk), Central Khmer(km), Kirghiz(ky), Lao(lo), Inner Mongolian(mvf), Marathi(mr), Malay(ms), Burmese(my), Nepali(ne), Pashto(ps), Sinhala(si), Swahili(sw), Tamil(ta), Telugu(te), Tajik(tg), Tagalog(tl), Uighur(ug), Urdu(ur), Uzbek(uz), Yue Chinese(yue) |
130
 
131
  ## Citation
132
 
133
  If you find our paper useful for your research, please kindly cite our paper:
134
  ```bash
135
  @misc{luoyf2025lmt,
136
+ title={NiuTrans.LMT: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
137
  author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
138
  year={2025},
139
  eprint={2511.07003},