Ngiemboon→French Fine-Tuned NLLB-200

Model Overview

This repository hosts a fine-tuned version of facebook/nllb-200-distilled-600M adapted for translation from Ngiemboon (nnh) to French (fra). The model was trained on the mimba/text2text dataset using Hugging Face Seq2SeqTrainer.

Training Details

  • Base model: facebook/nllb-200-distilled-600M
  • Dataset: mimba/text2text (Ngiemboon→French pairs)
  • Tokenizer: NLLB-200 tokenizer with added special token __ngiemboon__
  • Max length: 128 tokens (95th percentile of source sentences)
  • Batch size: 8
  • Epochs: 3
  • Evaluation metric: SacreBLEU
  • Framework: Hugging Face Transformers

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "mimba/nllb200-ngiemboon2fr"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "__ngiemboon__ Ngiemboon phrase de test"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

  • BLEU score computed with SacreBLEU on validation set.
  • Qualitative translations show fluent French outputs for typical Ngiemboon sentences.

Limitations

  • May truncate very long sentences (>128 tokens).
  • Performance depends on dataset size and domain coverage.
  • Designed primarily for Ngiemboon→French; reverse direction not fine-tuned.

BibTeX entry and citation info

If you use this model, please cite:

@misc{mimba2026ngiemboon,
  author = {Mimba},
  title = {Ngiemboon→French Fine-Tuned NLLB-200},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/mimba/nllb200-ngiemboon2fr}}
}
Downloads last month
70
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mimba/nllb200-ngiemboon2fr

Finetuned
(236)
this model

Dataset used to train mimba/nllb200-ngiemboon2fr