Update model card: Intended Use, Limitations, code example, BibTeX, licence/email/description fixes

#2
by rdelyon - opened
Files changed (1) hide show
  1. README.md +113 -0
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-2.0
3
+ language:
4
+ - en
5
+ - rw
6
+ datasets:
7
+ - mbazaNLP/NMT_Education_parallel_data_en_kin
8
+ - mbazaNLP/Kinyarwanda_English_parallel_dataset
9
+ pipeline_tag: translation
10
+ library_name: transformers
11
+ tags:
12
+ - nllb
13
+ - translation
14
+ - kinyarwanda
15
+ - education
16
+ ---
17
+
18
+ # NLLB-Education β€” English ↔ Kinyarwanda (Education Domain)
19
+
20
+ Machine translation model for English ↔ Kinyarwanda, specialised for **education-domain** content.
21
+ Fine-tuned from [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B).
22
+
23
+ **Fine-tuning code:** [Digital-Umuganda/twb_nllb_finetuning](https://github.com/Digital-Umuganda/twb_nllb_finetuning)
24
+
25
+ ## Usage
26
+
27
+ ```python
28
+ from transformers import pipeline
29
+
30
+ # English β†’ Kinyarwanda
31
+ translator = pipeline(
32
+ "translation",
33
+ model="mbazaNLP/NLLB-Education",
34
+ src_lang="eng_Latn",
35
+ tgt_lang="kin_Latn",
36
+ max_length=400,
37
+ )
38
+ result = translator("Education is the foundation of sustainable development.")
39
+ print(result[0]["translation_text"])
40
+
41
+ # Kinyarwanda β†’ English
42
+ translator_rev = pipeline(
43
+ "translation",
44
+ model="mbazaNLP/NLLB-Education",
45
+ src_lang="kin_Latn",
46
+ tgt_lang="eng_Latn",
47
+ max_length=400,
48
+ )
49
+ result = translator_rev("Uburezi ni ishingiro ry'iterambere rirambye.")
50
+ print(result[0]["translation_text"])
51
+ ```
52
+
53
+ ## Intended Use
54
+
55
+ **Suitable for:**
56
+ - Translating education-related content between English and Kinyarwanda
57
+ - Supporting EdTech applications for Rwanda
58
+ - Research into domain-adapted NMT for low-resource African languages
59
+
60
+ **Not intended for:**
61
+ - General-purpose translation (use `Nllb_finetuned_general_en_kin` instead)
62
+ - Legal, medical, or other high-stakes translation without human review
63
+
64
+ ## Training
65
+
66
+ Fine-tuned on:
67
+ - [mbazaNLP/NMT_Education_parallel_data_en_kin](https://huggingface.co/datasets/mbazaNLP/NMT_Education_parallel_data_en_kin)
68
+ - [mbazaNLP/Kinyarwanda_English_parallel_dataset](https://huggingface.co/datasets/mbazaNLP/Kinyarwanda_English_parallel_dataset)
69
+
70
+ Training hardware: A100 40 GB GPU.
71
+
72
+ ## Evaluation
73
+
74
+ <!-- TODO: add BLEU/spBLEU/chrF++ scores from evaluation -->
75
+
76
+ | Lang. Direction | BLEU | spBLEU | chrF++ | TER |
77
+ |-----------------|------|--------|--------|-----|
78
+ | Eng β†’ Kin | β€” | β€” | β€” | β€” |
79
+ | Kin β†’ Eng | β€” | β€” | β€” | β€” |
80
+
81
+ ## Limitations
82
+
83
+ - Domain-adapted for education content; quality may drop on out-of-domain text.
84
+ - Low-frequency Kinyarwanda vocabulary and tonal nuances may not be handled accurately.
85
+ - Outputs should be reviewed by a human translator for high-stakes applications.
86
+ - Maximum reliable input length is approximately 200 tokens.
87
+
88
+ ## Bias and Fairness
89
+
90
+ Training data reflects written, formal educational language. Colloquial or dialectal Kinyarwanda may be translated with lower quality.
91
+
92
+ ## Bias and Fairness
93
+
94
+ Machine translation models can reflect and amplify biases present in training data. Known limitations include:
95
+
96
+ - **Domain bias:** Fine-tuned on specific domain data; performance may be lower on out-of-domain text.
97
+ - **Cultural bias:** Idiomatic expressions, gender-neutral constructs, and culturally specific references in English may not translate accurately or naturally into Kinyarwanda.
98
+ - **Data source bias:** Training data was sourced from specific platforms; text from other sources or registers may yield lower quality translations.
99
+ - **Gender:** English gender-neutral pronouns may be rendered with gendered forms in Kinyarwanda based on distributional patterns in training data.
100
+
101
+ Validate translation quality on domain-representative samples before deployment in high-stakes contexts (legal, medical, government communications).
102
+
103
+ ## Citation
104
+
105
+ ```bibtex
106
+ @misc{mbazaNLP2023nllb_education,
107
+ author = {MBAZA-NLP Community},
108
+ title = {{NLLB}-Education: English--Kinyarwanda Machine Translation (Education Domain)},
109
+ year = {2023},
110
+ url = {https://huggingface.co/mbazaNLP/NLLB-Education},
111
+ note = {Hugging Face model repository}
112
+ }
113
+ ```