NLLB-Tarjumi: Fine-tuned Translation for African Languages
Fine-tuned NLLB-200-distilled-600M on parallel corpus data for Kenyan languages.
Training Results
| Language | Base BLEU | Fine-tuned BLEU | Improvement |
|---|---|---|---|
| Kikuyu (ki) | 17.00 | 22.24 | +31% |
| Luo (luo) | 11.29 | 18.64 | +65% |
| Swahili (sw) | 16.45 | 19.39 | +18% |
Training Details
- Base model: facebook/nllb-200-distilled-600M
- Method: LoRA (rank-16, alpha-32)
- Corpus: 124K parallel pairs (Bible + JW300)
- Hardware: NVIDIA H100 80GB
- Training time: 13 minutes
- Cost: ~$0.75
Supported Languages
16 African languages: Kikuyu, Swahili, Luo, Luhya, Kalenjin, Kamba, Meru, Maasai, Somali, Amharic, Yoruba, Zulu, Xhosa, Shona, Kinyarwanda, Luganda.
Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("cif-ai/nllb-tarjumi")
tokenizer = AutoTokenizer.from_pretrained("cif-ai/nllb-tarjumi")
tokenizer.src_lang = "eng_Latn"
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
tgt_id = tokenizer.convert_tokens_to_ids("kik_Latn")
translated = model.generate(**inputs, forced_bos_token_id=tgt_id, max_new_tokens=128)
print(tokenizer.decode(translated[0], skip_special_tokens=True))
Product
This model powers Tarjumi — a free translation service for African mother-tongue languages.
Built by CIF AI.
- Downloads last month
- 60
Model tree for cif-ai/nllb-tarjumi
Base model
facebook/nllb-200-distilled-600M