Language License Framework Organization


KokborokBERT

KokborokBERT is the first publicly released masked language model for the Kokborok language. It is built via domain-adaptive fine-tuning of XLM-RoBERTa-base on a curated Kokborok corpus.

Training Performance

The model was trained for 13 epochs on an NVIDIA A40.

Metric Baseline (XLM-R) KokborokBERT
Masked Loss 5.9831 1.7752
Perplexity 396.69 5.90
Improvement - 67.2x error reduction

Usage

You can use this model directly with a pipeline for masked language modeling:

from transformers import pipeline

mask_filler = pipeline("fill-mask", model="MWirelabs/kokborokbert")
test_text = "O kothar-no nwng jeni-hai-pha-no <mask> khlai-man-nai."
results = mask_filler(test_text)

for res in results:
    print(f"Score: {res['score']:.4f} | Prediction: {res['token_str']}")

License

This model is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Limitations and Biases

  • Domain Specificity: The model was trained on a specific corpus of ~391k tokens. Performance may vary when applied to dialects or specialized domains (medical, legal) not heavily represented in the training data.
  • Base Model Inheritances: As a fine-tuned version of xlm-roberta-base, this model may inherit biases present in the original multilingual pre-training data.
  • Task Limitation: This is an encoder-only Masked Language Model. It is designed for tasks like NER, classification, and similarity, but is not intended for text generation (NLG).

Citation

If you use this model in your research, please cite it as follows:

@misc{kokborokbert2026,
  author       = {MWire Labs},
  title        = {KokborokBERT: Domain-Adaptive Fine-Tuning of XLM-RoBERTa for Kokborok},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/MWirelabs/kokborokbert}}
}
Downloads last month
12
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results