Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,41 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 💡Model Description
|
| 2 |
+
|
| 3 |
+
Official model repository for our **ACL 2026 Main Conference** paper "*Language on Demand, Knowledge at Core*: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality".
|
| 4 |
+
|
| 5 |
+
## ✨XBridge-base
|
| 6 |
+
|
| 7 |
+
[`XBridge-base`](https://huggingface.co/ICTNLP/XBridge-base) is trained with stage 1 (cross-model alignment) using trilingual translation data, composing [`LLaMA3-8B`](https://huggingface.co/meta-llama/Meta-Llama-3-8B) with [`NLLB-200-1.3B`](https://huggingface.co/facebook/nllb-200-1.3B). Training is conducted on 10 languages:
|
| 8 |
+
|
| 9 |
+
> Bn, De, En, Es, Fr, Ja, Ru, Sw, Th, Zh
|
| 10 |
+
|
| 11 |
+
Despite being trained on a limited set of languages, we observe in our analysis that **stage 1 learns a language-agnostic cross-model alignment**, which generalizes well beyond the seen languages.
|
| 12 |
+
|
| 13 |
+
## ✨XBridge-SFT
|
| 14 |
+
|
| 15 |
+
[`XBridge-SFT`](https://huggingface.co/ICTNLP/XBridge-SFT) further extends `XBridge-base` by training stage 2 (encoder-side adaptation) and stage 3 (decoder-side adaptation) for instruction-following tasks. Notably, we directly scale to 50 languages in these stages. This design is motivated by our finding of cross-model generalization. We train on the multilingual instruction-following dataset [`Bactrian-X`](https://huggingface.co/datasets/MBZUAI/Bactrian-X), and expand to the following additional languages:
|
| 16 |
+
|
| 17 |
+
> Af, Ar, Az, Cs, El, Et, Fa, Fi, Gl, Gu, He, Hi, Hr, Id, It, Ka, Kk, Km, Lt, Lv, Mk, Ml, Mn, Mr, My, Ne, Nl, Pl, Ps, Pt, Ro, Sl, Sv, Ta, Te, Tr, Uk, Ur, Vi, Xh
|
| 18 |
+
|
| 19 |
+
Empirically, we find that this direct scaling strategy achieves strong performance, demonstrating the robustness and generalization ability of the stage 1 alignment.
|
| 20 |
+
|
| 21 |
+
See our [paper](https://arxiv.org/abs/2603.17512) for more details, and try our Gradio demo in the [github repository](https://github.com/ictnlp/XBridge)!
|
| 22 |
+
|
| 23 |
+
# 📚Citation
|
| 24 |
+
|
| 25 |
+
If you find this model or our work useful, please cite:
|
| 26 |
+
|
| 27 |
+
```tex
|
| 28 |
+
@misc{bu2026languagedemandknowledgecore,
|
| 29 |
+
title={Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality},
|
| 30 |
+
author={Mengyu Bu and Yang Feng},
|
| 31 |
+
year={2026},
|
| 32 |
+
eprint={2603.17512},
|
| 33 |
+
archivePrefix={arXiv},
|
| 34 |
+
primaryClass={cs.CL},
|
| 35 |
+
url={https://arxiv.org/abs/2603.17512},
|
| 36 |
+
}
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
# 📮Contact
|
| 40 |
+
|
| 41 |
+
For questions, please contact: `bumengyu23z@ict.ac.cn`
|