NiuTrans
/

LMT-60-0.6B-Base

@@ -1,4 +1,6 @@
 ---
 language:
 - en
 - zh
@@ -60,10 +62,9 @@ language:
 - ur
 - uz
 - yue
-base_model:
-- Qwen/Qwen3-0.6B-Base
 license: apache-2.0
-pipeline_tag: translation
 ---
 ## LMT
@@ -71,7 +72,12 @@ pipeline_tag: translation
 - Github: [LMT](https://github.com/NiuTrans/LMT)
 **LMT-60** is a suite of **Chinese-English-centric** MMT models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
- We release both the CPT and SFT versions of LMT-60 in four sizes (0.6B/1.7B/4B/8B). All checkpoints are available:
 | Models | Model Link |
 |:------------|:------------|
 | LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
@@ -95,7 +101,9 @@ model_name = "NiuTrans/LMT-60-8B"
 tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
 model = AutoModelForCausalLM.from_pretrained(model_name)
-prompt = "Translate the following text from English into Chinese.\nEnglish: The concept came from China where plum blossoms were the flower of choice.\nChinese: "
 messages = [{"role": "user", "content": prompt}]
 text = tokenizer.apply_chat_template(
     messages,
@@ -105,7 +113,7 @@ text = tokenizer.apply_chat_template(
 model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(**model_inputs, max_new_tokens=512, num_beams=5, do_sample=False)
-output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
 outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
@@ -125,12 +133,12 @@ print("response:", outputs)
 If you find our paper useful for your research, please kindly cite our paper:
 ```bash
 @misc{luoyf2025lmt,
-      title={Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
       author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
       year={2025},
       eprint={2511.07003},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2511.07003},
 }
 ```

 ---
+base_model:
+- Qwen/Qwen3-0.6B-Base
 language:
 - en
 - zh
 - ur
 - uz
 - yue
 license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
 ---
 ## LMT
 - Github: [LMT](https://github.com/NiuTrans/LMT)
 **LMT-60** is a suite of **Chinese-English-centric** MMT models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
+We release both the CPT and SFT versions of LMT-60 in four sizes (0.6B/1.7B/4B/8B).
+## Abstract
+Large language models have significantly advanced Multilingual Machine Translation (MMT), yet the broad language coverage, consistent translation quality, and English-centric bias remain open challenges. To address these challenges, we introduce **LMT**, a suite of **L**arge-scale **M**ultilingual **T**ranslation models centered on both Chinese and English, covering 60 languages and 234 translation directions. During development, we identify a previously overlooked phenomenon of **directional degeneration**, where symmetric multi-way fine-tuning data overemphasize reverse directions (X $\to$ En/Zh), leading to excessive many-to-one mappings and degraded translation quality. We propose **Strategic Downsampling**, a simple yet effective method to mitigate this degeneration. In addition, we design **Parallel Multilingual Prompting (PMP)**, which leverages typologically related auxiliary languages to enhance cross-lingual transfer. Through rigorous data curation and refined adaptation strategies, LMT achieves SOTA performance among models of comparable language coverage, with our 4B model (LMT-60-4B) surpassing the much larger Aya-101-13B and NLLB-54B models by a substantial margin. We release LMT in four sizes (0.6B/1.7B/4B/8B) to catalyze future research and provide strong baselines for inclusive, scalable, and high-quality MMT.
+All checkpoints are available:
 | Models | Model Link |
 |:------------|:------------|
 | LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
 tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
 model = AutoModelForCausalLM.from_pretrained(model_name)
+prompt = "Translate the following text from English into Chinese.
+English: The concept came from China where plum blossoms were the flower of choice.
+Chinese: "
 messages = [{"role": "user", "content": prompt}]
 text = tokenizer.apply_chat_template(
     messages,
 model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(**model_inputs, max_new_tokens=512, num_beams=5, do_sample=False)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
 outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
 If you find our paper useful for your research, please kindly cite our paper:
 ```bash
 @misc{luoyf2025lmt,
+      title={Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
       author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
       year={2025},
       eprint={2511.07003},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2511.07003},
 }
 ```