Kokborok-MT
Neural machine translation model for Kokborok (Tripuri), a Tibeto-Burman language spoken primarily in Tripura, India. Fine-tuned from facebook/nllb-200-distilled-600M with a custom Kokborok language token (trp_Latn).
Model Details
- Base Model: facebook/nllb-200-distilled-600M
- Language Pair: English ↔ Kokborok (Roman script)
- Developed by: MWire Labs
- License: CC-BY-4.0
Performance
| System | Direction | BLEU | chrF | ROUGE-L | METEOR | TER | Cos Sim | COMET |
|---|---|---|---|---|---|---|---|---|
| Zero-Shot NLLB | EN→TRP | 0.50 | 11.89 | 0.0261 | 0.0132 | 139.51 | 0.1939 | 0.2697 |
| Zero-Shot NLLB | TRP→EN | 0.63 | 17.07 | 0.0675 | 0.0526 | 130.30 | 0.1872 | 0.2880 |
| MWirelabs/kokborok-mt | EN→TRP | 17.30 | 47.67 | 0.4332 | 0.3483 | 74.26 | 0.7596 | 0.7064 |
| MWirelabs/kokborok-mt | TRP→EN | 38.56 | 53.92 | 0.5919 | 0.5602 | 50.15 | 0.8171 | 0.6926 |
Fine-tuning provides 30x improvement over zero-shot NLLB baseline. Full evaluation details and ablation studies in the accompanying paper (forthcoming).
Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("MWirelabs/kokborok-mt")
model = AutoModelForSeq2SeqLM.from_pretrained("MWirelabs/kokborok-mt")
# English → Kokborok
tokenizer.src_lang = "eng_Latn"
tokenizer.tgt_lang = "trp_Latn"
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("trp_Latn"))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Kokborok → English
tokenizer.src_lang = "trp_Latn"
tokenizer.tgt_lang = "eng_Latn"
inputs = tokenizer("Chwng khwbai tamo?", return_tensors="pt")
outputs = model.generate(**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("eng_Latn"))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
@misc{mwirelabs_kokborok_mt_2026,
title={Kokborok-MT: Neural Machine Translation for Kokborok (Tripuri)},
author={MWire Labs},
year={2026},
howpublished={\url{https://huggingface.co/MWirelabs/kokborok-mt}},
}
About MWire Labs
MWire Labs builds foundational language technology for Northeast Indian indigenous languages.
- Downloads last month
- 22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support