You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Kokborok-MT

Neural machine translation model for Kokborok (Tripuri), a Tibeto-Burman language spoken primarily in Tripura, India. Fine-tuned from facebook/nllb-200-distilled-600M with a custom Kokborok language token (trp_Latn).

Model Details

  • Base Model: facebook/nllb-200-distilled-600M
  • Language Pair: English ↔ Kokborok (Roman script)
  • Developed by: MWire Labs
  • License: CC-BY-4.0

Performance

System Direction BLEU chrF ROUGE-L METEOR TER Cos Sim COMET
Zero-Shot NLLB EN→TRP 0.50 11.89 0.0261 0.0132 139.51 0.1939 0.2697
Zero-Shot NLLB TRP→EN 0.63 17.07 0.0675 0.0526 130.30 0.1872 0.2880
MWirelabs/kokborok-mt EN→TRP 17.30 47.67 0.4332 0.3483 74.26 0.7596 0.7064
MWirelabs/kokborok-mt TRP→EN 38.56 53.92 0.5919 0.5602 50.15 0.8171 0.6926

Fine-tuning provides 30x improvement over zero-shot NLLB baseline. Full evaluation details and ablation studies in the accompanying paper (forthcoming).

Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("MWirelabs/kokborok-mt")
model = AutoModelForSeq2SeqLM.from_pretrained("MWirelabs/kokborok-mt")

# English → Kokborok
tokenizer.src_lang = "eng_Latn"
tokenizer.tgt_lang = "trp_Latn"
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("trp_Latn"))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Kokborok → English
tokenizer.src_lang = "trp_Latn"
tokenizer.tgt_lang = "eng_Latn"
inputs = tokenizer("Chwng khwbai tamo?", return_tensors="pt")
outputs = model.generate(**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("eng_Latn"))
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{mwirelabs_kokborok_mt_2026,
  title={Kokborok-MT: Neural Machine Translation for Kokborok (Tripuri)},
  author={MWire Labs},
  year={2026},
  howpublished={\url{https://huggingface.co/MWirelabs/kokborok-mt}},
}

About MWire Labs

MWire Labs builds foundational language technology for Northeast Indian indigenous languages.

Downloads last month
22
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using MWirelabs/kokborok-mt 1