Instructions to use mbazaNLP/NLLB-Education with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mbazaNLP/NLLB-Education with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("mbazaNLP/NLLB-Education") model = AutoModelForSeq2SeqLM.from_pretrained("mbazaNLP/NLLB-Education") - Notebooks
- Google Colab
- Kaggle
Update model card: Intended Use, Limitations, code example, BibTeX, licence/email/description fixes
#2
by rdelyon - opened
README.md
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- rw
|
| 6 |
+
datasets:
|
| 7 |
+
- mbazaNLP/NMT_Education_parallel_data_en_kin
|
| 8 |
+
- mbazaNLP/Kinyarwanda_English_parallel_dataset
|
| 9 |
+
pipeline_tag: translation
|
| 10 |
+
library_name: transformers
|
| 11 |
+
tags:
|
| 12 |
+
- nllb
|
| 13 |
+
- translation
|
| 14 |
+
- kinyarwanda
|
| 15 |
+
- education
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# NLLB-Education β English β Kinyarwanda (Education Domain)
|
| 19 |
+
|
| 20 |
+
Machine translation model for English β Kinyarwanda, specialised for **education-domain** content.
|
| 21 |
+
Fine-tuned from [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B).
|
| 22 |
+
|
| 23 |
+
**Fine-tuning code:** [Digital-Umuganda/twb_nllb_finetuning](https://github.com/Digital-Umuganda/twb_nllb_finetuning)
|
| 24 |
+
|
| 25 |
+
## Usage
|
| 26 |
+
|
| 27 |
+
```python
|
| 28 |
+
from transformers import pipeline
|
| 29 |
+
|
| 30 |
+
# English β Kinyarwanda
|
| 31 |
+
translator = pipeline(
|
| 32 |
+
"translation",
|
| 33 |
+
model="mbazaNLP/NLLB-Education",
|
| 34 |
+
src_lang="eng_Latn",
|
| 35 |
+
tgt_lang="kin_Latn",
|
| 36 |
+
max_length=400,
|
| 37 |
+
)
|
| 38 |
+
result = translator("Education is the foundation of sustainable development.")
|
| 39 |
+
print(result[0]["translation_text"])
|
| 40 |
+
|
| 41 |
+
# Kinyarwanda β English
|
| 42 |
+
translator_rev = pipeline(
|
| 43 |
+
"translation",
|
| 44 |
+
model="mbazaNLP/NLLB-Education",
|
| 45 |
+
src_lang="kin_Latn",
|
| 46 |
+
tgt_lang="eng_Latn",
|
| 47 |
+
max_length=400,
|
| 48 |
+
)
|
| 49 |
+
result = translator_rev("Uburezi ni ishingiro ry'iterambere rirambye.")
|
| 50 |
+
print(result[0]["translation_text"])
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## Intended Use
|
| 54 |
+
|
| 55 |
+
**Suitable for:**
|
| 56 |
+
- Translating education-related content between English and Kinyarwanda
|
| 57 |
+
- Supporting EdTech applications for Rwanda
|
| 58 |
+
- Research into domain-adapted NMT for low-resource African languages
|
| 59 |
+
|
| 60 |
+
**Not intended for:**
|
| 61 |
+
- General-purpose translation (use `Nllb_finetuned_general_en_kin` instead)
|
| 62 |
+
- Legal, medical, or other high-stakes translation without human review
|
| 63 |
+
|
| 64 |
+
## Training
|
| 65 |
+
|
| 66 |
+
Fine-tuned on:
|
| 67 |
+
- [mbazaNLP/NMT_Education_parallel_data_en_kin](https://huggingface.co/datasets/mbazaNLP/NMT_Education_parallel_data_en_kin)
|
| 68 |
+
- [mbazaNLP/Kinyarwanda_English_parallel_dataset](https://huggingface.co/datasets/mbazaNLP/Kinyarwanda_English_parallel_dataset)
|
| 69 |
+
|
| 70 |
+
Training hardware: A100 40 GB GPU.
|
| 71 |
+
|
| 72 |
+
## Evaluation
|
| 73 |
+
|
| 74 |
+
<!-- TODO: add BLEU/spBLEU/chrF++ scores from evaluation -->
|
| 75 |
+
|
| 76 |
+
| Lang. Direction | BLEU | spBLEU | chrF++ | TER |
|
| 77 |
+
|-----------------|------|--------|--------|-----|
|
| 78 |
+
| Eng β Kin | β | β | β | β |
|
| 79 |
+
| Kin β Eng | β | β | β | β |
|
| 80 |
+
|
| 81 |
+
## Limitations
|
| 82 |
+
|
| 83 |
+
- Domain-adapted for education content; quality may drop on out-of-domain text.
|
| 84 |
+
- Low-frequency Kinyarwanda vocabulary and tonal nuances may not be handled accurately.
|
| 85 |
+
- Outputs should be reviewed by a human translator for high-stakes applications.
|
| 86 |
+
- Maximum reliable input length is approximately 200 tokens.
|
| 87 |
+
|
| 88 |
+
## Bias and Fairness
|
| 89 |
+
|
| 90 |
+
Training data reflects written, formal educational language. Colloquial or dialectal Kinyarwanda may be translated with lower quality.
|
| 91 |
+
|
| 92 |
+
## Bias and Fairness
|
| 93 |
+
|
| 94 |
+
Machine translation models can reflect and amplify biases present in training data. Known limitations include:
|
| 95 |
+
|
| 96 |
+
- **Domain bias:** Fine-tuned on specific domain data; performance may be lower on out-of-domain text.
|
| 97 |
+
- **Cultural bias:** Idiomatic expressions, gender-neutral constructs, and culturally specific references in English may not translate accurately or naturally into Kinyarwanda.
|
| 98 |
+
- **Data source bias:** Training data was sourced from specific platforms; text from other sources or registers may yield lower quality translations.
|
| 99 |
+
- **Gender:** English gender-neutral pronouns may be rendered with gendered forms in Kinyarwanda based on distributional patterns in training data.
|
| 100 |
+
|
| 101 |
+
Validate translation quality on domain-representative samples before deployment in high-stakes contexts (legal, medical, government communications).
|
| 102 |
+
|
| 103 |
+
## Citation
|
| 104 |
+
|
| 105 |
+
```bibtex
|
| 106 |
+
@misc{mbazaNLP2023nllb_education,
|
| 107 |
+
author = {MBAZA-NLP Community},
|
| 108 |
+
title = {{NLLB}-Education: English--Kinyarwanda Machine Translation (Education Domain)},
|
| 109 |
+
year = {2023},
|
| 110 |
+
url = {https://huggingface.co/mbazaNLP/NLLB-Education},
|
| 111 |
+
note = {Hugging Face model repository}
|
| 112 |
+
}
|
| 113 |
+
```
|