YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

BayLing-MLingual: One Model, 50 Languages, 2500 Cross-lingual Pairs

Mengyu Bu, Yang Feng

arXiv github github

BayLing-MLingual is a multilingual question-answering model that supports 50 languages and 2500 cross-lingual pairs. Built on top of XBridge, BayLing-MLingual leverages a compositional Encoder-LLM-Decoder architecture that separates language understanding, knowledge & reasoning, and Language generation. This design enables strong multilingual performance across both high-resource and low-resource languages while preserving the reasoning capabilities of the base LLM.

🚀Key Features

  • 50 languages and 2500 cross-lingual pairs: A single model supports 50 languages across diverse language families. Input and output languages can be selected independently.
  • Strong multilingual performance: BayLing-MLingual preserves the reasoning and knowledge capabilities of the underlying LLM while extending multilingual understanding and generation.
  • Low-resource & unseen language transfer: BayLing-MLingual demonstrates strong performance on high-resource languages, low-resource languages and previously unseen languages, without retraining the LLM.
  • Efficient Deployment: Only lightweight multilingual modules are added on top of the LLM.

💬 Example Interactions

Japanese → Swahili

Question

地球は丸いですか?

Answer

Ndiyo. Dunia ni mviringo.

Arabic → Chinese

Question

أين تقع عاصمة الصين؟

Answer

中国的首都是北京。

Bengali → German

Question

সূর্য কেন উজ্জ্বল?

Answer

Die Sonne leuchtet aufgrund der Kernfusion im Sonnenkern.

🌐Supported Languages

Code Language
en English
zh Chinese
ja Japanese
de German
fr French
es Spanish
ru Russian
sw Swahili
bn Bengali
th Thai
af Afrikaans
ar Arabic
az Azerbaijani
cs Czech
el Greek
et Estonian
fa Persian
fi Finnish
gl Galician
gu Gujarati
he Hebrew
hi Hindi
hr Croatian
id Indonesian
it Italian
ka Georgian
kk Kazakh
km Khmer
lt Lithuanian
lv Latvian
mk Macedonian
ml Malayalam
mn Mongolian
mr Marathi
my Burmese
ne Nepali
nl Dutch
pl Polish
ps Pashto
pt Portuguese
ro Romanian
sl Slovenian
sv Swedish
ta Tamil
te Telugu
tr Turkish
uk Ukrainian
ur Urdu
vi Vietnamese
xh Xhosa

📄Model Details

Item Value
Base LLM LLaMA3-8B
Framework XBridge
Architecture Encoder-LLM-Decoder
Languages 50
Cross-lingual Pairs 2500
Multilingual Encoder NLLB Encoder
Multilingual Decoder NLLB Decoder

🔬Technical Report

BayLing is built upon XBridge. For architecture details, training methodology, and experimental analysis, see XBridge repository and ACL 2026 paper.

⚖️LICENSE

Our code is released under the Apache-2.0 License. Our model is intended for academic research purposes only and may NOT be used for commercial purposes.

You are free to use, modify, and distribute this model in academic settings, provided that the following conditions are met:

  • Non-commercial use: The model may not be used for any commercial purposes.
  • Citation: If you use this model in your research, please cite the original work.

❗Commercial Use Restriction

For any commercial use inquiries or to obtain a commercial license, please contact fengyang@ict.ac.cn.

📚Citation

If you have any questions, please feel free to submit an issue or contact bumengyu23z@ict.ac.cn.

If you find this repository useful, please star this repository and cite our paper:

@misc{bu2026languagedemandknowledgecore,
      title={Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality}, 
      author={Mengyu Bu and Yang Feng},
      year={2026},
      eprint={2603.17512},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.17512}, 
}
Downloads last month
10
Safetensors
Model size
10B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for BayLing-Models/BayLing-MLingual