Update metadata (library_name, pipeline_tag, language) and add paper abstract
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,4 +1,6 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- zh
|
|
@@ -60,10 +62,9 @@ language:
|
|
| 60 |
- ur
|
| 61 |
- uz
|
| 62 |
- yue
|
| 63 |
-
base_model:
|
| 64 |
-
- Qwen/Qwen3-0.6B-Base
|
| 65 |
license: apache-2.0
|
| 66 |
-
pipeline_tag:
|
|
|
|
| 67 |
---
|
| 68 |
|
| 69 |
## LMT
|
|
@@ -71,7 +72,12 @@ pipeline_tag: translation
|
|
| 71 |
- Github: [LMT](https://github.com/NiuTrans/LMT)
|
| 72 |
|
| 73 |
**LMT-60** is a suite of **Chinese-English-centric** MMT models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
| Models | Model Link |
|
| 76 |
|:------------|:------------|
|
| 77 |
| LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
|
|
@@ -95,7 +101,9 @@ model_name = "NiuTrans/LMT-60-8B"
|
|
| 95 |
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
|
| 96 |
model = AutoModelForCausalLM.from_pretrained(model_name)
|
| 97 |
|
| 98 |
-
prompt = "Translate the following text from English into Chinese
|
|
|
|
|
|
|
| 99 |
messages = [{"role": "user", "content": prompt}]
|
| 100 |
text = tokenizer.apply_chat_template(
|
| 101 |
messages,
|
|
@@ -105,7 +113,7 @@ text = tokenizer.apply_chat_template(
|
|
| 105 |
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 106 |
|
| 107 |
generated_ids = model.generate(**model_inputs, max_new_tokens=512, num_beams=5, do_sample=False)
|
| 108 |
-
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
| 109 |
|
| 110 |
outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
|
| 111 |
|
|
@@ -125,12 +133,12 @@ print("response:", outputs)
|
|
| 125 |
If you find our paper useful for your research, please kindly cite our paper:
|
| 126 |
```bash
|
| 127 |
@misc{luoyf2025lmt,
|
| 128 |
-
title={Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
|
| 129 |
author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
|
| 130 |
year={2025},
|
| 131 |
eprint={2511.07003},
|
| 132 |
archivePrefix={arXiv},
|
| 133 |
primaryClass={cs.CL},
|
| 134 |
-
url={https://arxiv.org/abs/2511.07003},
|
| 135 |
}
|
| 136 |
```
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen3-0.6B-Base
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
- zh
|
|
|
|
| 62 |
- ur
|
| 63 |
- uz
|
| 64 |
- yue
|
|
|
|
|
|
|
| 65 |
license: apache-2.0
|
| 66 |
+
pipeline_tag: text-generation
|
| 67 |
+
library_name: transformers
|
| 68 |
---
|
| 69 |
|
| 70 |
## LMT
|
|
|
|
| 72 |
- Github: [LMT](https://github.com/NiuTrans/LMT)
|
| 73 |
|
| 74 |
**LMT-60** is a suite of **Chinese-English-centric** MMT models trained on **90B tokens** mixed monolingual and bilingual tokens, covering **60 languages across 234 translation directions** and achieving **SOTA performance** among models with similar language coverage.
|
| 75 |
+
We release both the CPT and SFT versions of LMT-60 in four sizes (0.6B/1.7B/4B/8B).
|
| 76 |
+
|
| 77 |
+
## Abstract
|
| 78 |
+
Large language models have significantly advanced Multilingual Machine Translation (MMT), yet the broad language coverage, consistent translation quality, and English-centric bias remain open challenges. To address these challenges, we introduce **LMT**, a suite of **L**arge-scale **M**ultilingual **T**ranslation models centered on both Chinese and English, covering 60 languages and 234 translation directions. During development, we identify a previously overlooked phenomenon of **directional degeneration**, where symmetric multi-way fine-tuning data overemphasize reverse directions (X $\to$ En/Zh), leading to excessive many-to-one mappings and degraded translation quality. We propose **Strategic Downsampling**, a simple yet effective method to mitigate this degeneration. In addition, we design **Parallel Multilingual Prompting (PMP)**, which leverages typologically related auxiliary languages to enhance cross-lingual transfer. Through rigorous data curation and refined adaptation strategies, LMT achieves SOTA performance among models of comparable language coverage, with our 4B model (LMT-60-4B) surpassing the much larger Aya-101-13B and NLLB-54B models by a substantial margin. We release LMT in four sizes (0.6B/1.7B/4B/8B) to catalyze future research and provide strong baselines for inclusive, scalable, and high-quality MMT.
|
| 79 |
+
|
| 80 |
+
All checkpoints are available:
|
| 81 |
| Models | Model Link |
|
| 82 |
|:------------|:------------|
|
| 83 |
| LMT-60-0.6B-Base | [NiuTrans/LMT-60-0.6B-Base](https://huggingface.co/NiuTrans/LMT-60-0.6B-Base) |
|
|
|
|
| 101 |
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
|
| 102 |
model = AutoModelForCausalLM.from_pretrained(model_name)
|
| 103 |
|
| 104 |
+
prompt = "Translate the following text from English into Chinese.
|
| 105 |
+
English: The concept came from China where plum blossoms were the flower of choice.
|
| 106 |
+
Chinese: "
|
| 107 |
messages = [{"role": "user", "content": prompt}]
|
| 108 |
text = tokenizer.apply_chat_template(
|
| 109 |
messages,
|
|
|
|
| 113 |
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 114 |
|
| 115 |
generated_ids = model.generate(**model_inputs, max_new_tokens=512, num_beams=5, do_sample=False)
|
| 116 |
+
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
| 117 |
|
| 118 |
outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
|
| 119 |
|
|
|
|
| 133 |
If you find our paper useful for your research, please kindly cite our paper:
|
| 134 |
```bash
|
| 135 |
@misc{luoyf2025lmt,
|
| 136 |
+
title={Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs},
|
| 137 |
author={Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu},
|
| 138 |
year={2025},
|
| 139 |
eprint={2511.07003},
|
| 140 |
archivePrefix={arXiv},
|
| 141 |
primaryClass={cs.CL},
|
| 142 |
+
url={https://arxiv.org/abs/2511.07003},
|
| 143 |
}
|
| 144 |
```
|