NiuTrans
/

LMT-60-1.7B-Base

@@ -48,7 +48,7 @@ language:
 - km
 - ky
 - lo
-- mn
 - mr
 - ms
 - my
@@ -73,11 +73,11 @@ library_name: transformers
 ---
 ## LMT
-- Paper: [Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs](https://arxiv.org/abs/2511.07003)
 - Github: [LMT](https://github.com/NiuTrans/LMT)
-**LMT-60** is a suite of **Chinese-English-centric** MMT models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
- We release both the CPT and SFT versions of LMT-60 in four sizes (0.6B/1.7B/4B/8B). All checkpoints are available:
 | Models | Model Link |
 |:------------|:------------|
 | LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
@@ -101,7 +101,7 @@ model_name = "NiuTrans/LMT-60-8B"
 tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
 model = AutoModelForCausalLM.from_pretrained(model_name)
-prompt = """Translate the following text from English into Chinese.
 English: The concept came from China where plum blossoms were the flower of choice.
 Chinese:"""
 messages = [{"role": "user", "content": prompt}]
@@ -125,15 +125,15 @@ print("response:", outputs)
 | Resource Tier | Languages |
 | :---- | :---- |
 | High-resource Languages (13) | Arabic(ar), English(en), Spanish(es), German(de), French(fr), Italian(it), Japanese(ja), Dutch(nl), Polish(pl), Portuguese(pt), Russian(ru), Turkish(tr), Chinese(zh) |
-| Medium-resource Languages (18) | Bulgarian(bg), Bengali(bn), Czech(cs), Danish(da), Modern Greek(el), Persian(fa), Finnish(fi), Hindi(hi), Hungarian(hu), Indonesian(id), Korean(ko), Norwegian(nb), Romanian(ro), Slovak(sk), Swedish(sv), Thai(th), Ukrainian(uk), Vietnamese(vi) |
-| Low-resouce Languages (29) | Amharic(am), Azerbaijani(az), Tibetan(bo), Modern Hebrew(he), Croatian(hr), Armenian(hy), Icelandic(is), Javanese(jv), Georgian(ka), Kazakh(kk), Central Khmer(km), Kirghiz(ky), Lao(lo), Chinese Mongolian(mn_cn), Marathi(mr), Malay(ms), Burmese(my), Nepali(ne), Pashto(ps), Sinhala(si), Swahili(sw), Tamil(ta), Telugu(te), Tajik(tg), Tagalog(tl), Uighur(ug), Urdu(ur), Uzbek(uz), Yue Chinese(yue) |
 ## Citation
 If you find our paper useful for your research, please kindly cite our paper:
 ```bash
 @misc{luoyf2025lmt,
-      title={Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
       author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
       year={2025},
       eprint={2511.07003},

 - km
 - ky
 - lo
+- mvf
 - mr
 - ms
 - my
 ---
 ## LMT
+- Paper: [NiuTrans.LMT: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs](https://arxiv.org/abs/2511.07003)
 - Github: [LMT](https://github.com/NiuTrans/LMT)
+**LMT-60** is a suite of **Chinese-English-centric** Multilingual Machine Translation (MMT) models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
+ We release both the CPT and GRPO versions of LMT-60 in four sizes (0.6B/1.7B/4B/8B). All checkpoints are available:
 | Models | Model Link |
 |:------------|:------------|
 | LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
 tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
 model = AutoModelForCausalLM.from_pretrained(model_name)
+prompt = """Translate the following text from English into Chinese:
 English: The concept came from China where plum blossoms were the flower of choice.
 Chinese:"""
 messages = [{"role": "user", "content": prompt}]
 | Resource Tier | Languages |
 | :---- | :---- |
 | High-resource Languages (13) | Arabic(ar), English(en), Spanish(es), German(de), French(fr), Italian(it), Japanese(ja), Dutch(nl), Polish(pl), Portuguese(pt), Russian(ru), Turkish(tr), Chinese(zh) |
+| Medium-resource Languages (18) | Bulgarian(bg), Bengali(bn), Czech(cs), Danish(da), Modern Greek(el), Persian(fa), Finnish(fi), Hindi(hi), Hungarian(hu), Indonesian(id), Korean(ko), Norwegian Bokmål(nb), Romanian(ro), Slovak(sk), Swedish(sv), Thai(th), Ukrainian(uk), Vietnamese(vi) |
+| Low-resouce Languages (29) | Amharic(am), Azerbaijani(az), Tibetan(bo), Modern Hebrew(he), Croatian(hr), Armenian(hy), Icelandic(is), Javanese(jv), Georgian(ka), Kazakh(kk), Central Khmer(km), Kirghiz(ky), Lao(lo), Inner Mongolian(mvf), Marathi(mr), Malay(ms), Burmese(my), Nepali(ne), Pashto(ps), Sinhala(si), Swahili(sw), Tamil(ta), Telugu(te), Tajik(tg), Tagalog(tl), Uighur(ug), Urdu(ur), Uzbek(uz), Yue Chinese(yue) |
 ## Citation
 If you find our paper useful for your research, please kindly cite our paper:
 ```bash
 @misc{luoyf2025lmt,
+      title={NiuTrans.LMT: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
       author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
       year={2025},
       eprint={2511.07003},