chrismazii commited on
Commit
9d3998c
·
verified ·
1 Parent(s): 095c699

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -142
README.md CHANGED
@@ -74,7 +74,7 @@ Rwanda's thriving MT ecosystem includes companies like Digital Umuganda, KINLP,
74
  | BLEU | N/A | 0.30 | 0.34 | 0.23 | 0.62 |
75
  | chrF | N/A | 0.38 | 0.30 | 0.21 | 0.34 |
76
 
77
- ** State-of-the-Art Results**: Both KinyCOMET variants significantly outperform existing baselines, with KinyCOMET-Unbabel achieving the highest correlation across all metrics.
78
 
79
  ## Performance Highlights
80
 
@@ -98,105 +98,136 @@ Rwanda's thriving MT ecosystem includes companies like Digital Umuganda, KINLP,
98
  - Both KinyCOMET variants significantly outperform AfriCOMET baselines despite including Kinyarwanda
99
  - Surprising finding: Unbabel baseline (not trained on Kinyarwanda) outperforms AfriCOMET variants
100
 
101
- # KinyCOMET Model Usage
102
- This model evaluates machine translation quality for Kinyarwanda translations.
103
  ## Installation
 
 
 
104
  ```bash
105
- pip install unbabel-comet huggingface_hub
106
  ```
107
- ### Quick Start
108
 
109
- ```from huggingface_hub import hf_hub_download
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  from comet import load_from_checkpoint
111
- import os
112
- import shutil
113
-
114
- # Download model files from Hugging Face
115
- checkpoint_path = hf_hub_download(
116
- repo_id="chrismazii/kinycomet_unbabel",
117
- filename="KinyCOMET+Unbabel.ckpt"
118
- )
119
- hparams_path = hf_hub_download(
120
- repo_id="chrismazii/kinycomet_unbabel",
121
- filename="hparams.yaml"
122
- )
123
-
124
- # Set up directory structure for COMET
125
- work_dir = "/tmp/kinycomet"
126
- checkpoints_dir = os.path.join(work_dir, "checkpoints")
127
- os.makedirs(checkpoints_dir, exist_ok=True)
128
-
129
- # Copy files to expected structure
130
- shutil.copy(checkpoint_path, os.path.join(checkpoints_dir, "KinyCOMET+Unbabel.ckpt"))
131
- shutil.copy(hparams_path, os.path.join(work_dir, "hparams.yaml"))
132
-
133
- # Load the model
134
- model = load_from_checkpoint(os.path.join(checkpoints_dir, "KinyCOMET+Unbabel.ckpt"))
135
-
136
- # Now use the model with your data
137
- data = [{"src": "source text", "mt": "translation"}]
138
- segment_scores, system_score = model.predict(data, gpus=0)
139
- print(segment_scores, system_score)
140
  ```
141
 
142
- ### Hugging Face Integration
143
 
144
  ```python
145
- from transformers import pipeline
146
-
147
- # Load via Hugging Face
148
- quality_estimator = pipeline(
149
- "text-classification",
150
- model="chrismazii/kinycomet_unbabel",
151
- tokenizer="chrismazii/kinycomet_unbabel"
152
- )
153
-
154
- # Estimate quality
155
- result = quality_estimator({
156
- "src": "Umugabo ararya",
157
- "mt": "The man is eating"
158
  })
159
  ```
160
 
161
- ## Training Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
 
163
- ### Dataset
164
- - **Custom Kinyarwanda-English QE Dataset** with train/validation/test splits
165
- - **Score Normalization**: All quality scores normalized to [0,1] range during preprocessing
166
- - **Bidirectional Coverage**: Includes both translation directions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
  ## Training Details
169
 
170
- ### Dataset Construction
171
- Our training dataset represents a major community effort:
172
- - **4,323 samples** from three high-quality parallel corpora:
173
- - Mbaza Education Dataset
174
- - Mbaza Tourism Dataset
175
- - Digital Umuganda Dataset
176
- - **15 linguistics students** as human annotators using Direct Assessment (DA) methodology
177
- - **Quality control**: Minimum 3 annotations per sample, removed samples with σ > 20 (410 samples/9.48%)
178
- - **Data split**: 80% train (3,497) / 10% validation (404) / 10% test (422)
179
-
180
- ### Translation Systems Evaluated
181
- Six diverse MT systems for comprehensive coverage:
182
- - **LLM-based**: Claude 3.7-Sonnet, GPT-4o, GPT-4.1, Gemini Flash 2.0
183
- - **Traditional**: Facebook NLLB (1.3B and 600M parameters)
184
 
185
  ### Training Configuration
186
- - **Base Models**: XLM-RoBERTa-large and Unbabel/wmt22-comet-da
187
  - **Methodology**: COMET framework with Direct Assessment supervision
188
  - **Evaluation Metrics**: Kendall's τ and Spearman ρ correlation with human DA scores
189
- - **Note**: XLM-RoBERTa was not originally trained on Kinyarwanda data, yet achieves strong performance
190
-
191
- ### Data Distribution
192
- **DA Score Statistics**:
193
- - Overall: μ=87.73, σ=14.14
194
- - English→Kinyarwanda: μ=84.60, σ=16.28
195
- - Kinyarwanda→English: μ=91.05, σ=10.47
196
-
197
- Distribution pattern similar to WMT datasets (2017-2022), indicating alignment with international evaluation standards.
198
 
199
  ### MT System Benchmarking Results
 
200
  Our evaluation of production MT systems reveals interesting insights:
201
 
202
  | MT System | Kinyarwanda→English | English→Kinyarwanda | Overall |
@@ -213,75 +244,25 @@ Our evaluation of production MT systems reveals interesting insights:
213
  - All systems perform better on Kinyarwanda→English than English→Kinyarwanda
214
  - Score differences are subtle but statistically meaningful with KinyCOMET's precision
215
 
216
- ## Evaluation & Metrics
217
-
218
- The model is evaluated using standard quality estimation metrics:
219
-
220
- - **Pearson Correlation**: Measures linear correlation with human judgments
221
- - **Spearman Correlation**: Measures monotonic correlation with human rankings
222
- - **System Score**: Overall translation system quality assessment
223
- - **MAE/RMSE**: Mean absolute error and root mean square error
224
-
225
- ## Dataset Access
226
-
227
- Our human-annotated Kinyarwanda-English quality estimation dataset is publicly available:
228
-
229
- ```python
230
- from huggingface_hub import hf_hub_download
231
- import pandas as pd
232
-
233
- # Download dataset files
234
- train_file = hf_hub_download(repo_id="chrismazii/kinycomet_unbabel", filename="train.csv")
235
- val_file = hf_hub_download(repo_id="chrismazii/kinycomet_unbabel", filename="valid.csv")
236
- test_file = hf_hub_download(repo_id="chrismazii/kinycomet_unbabel", filename="test.csv")
237
-
238
- # Load the datasets
239
- train_df = pd.read_csv(train_file)
240
- val_df = pd.read_csv(val_file)
241
- test_df = pd.read_csv(test_file)
242
-
243
- print(f"Training samples: {len(train_df)}")
244
- print(f"Validation samples: {len(val_df)}")
245
- print(f"Test samples: {len(test_df)}")
246
-
247
- # Convert to list of dictionaries for COMET usage
248
- train_samples = train_df.to_dict('records')
249
-
250
- # Example sample structure
251
- print(train_samples[0])
252
- # {
253
- # 'src': 'Umugabo ararya',
254
- # 'mt': 'The man is eating',
255
- # 'ref': 'The man is eating',
256
- # 'score': 0.89,
257
- # 'direction': 'kin2eng'
258
- # }
259
- ```
260
-
261
- **Dataset Characteristics**:
262
- - **Total samples**: 4,323 (train: 3,497, val: 404, test: 422)
263
- - **Directions**: Bidirectional rw↔en
264
- - **Annotation**: Human Direct Assessment scores [0-100] normalized to [0-1]
265
- - **Quality**: Multi-annotator agreement, high-variance samples removed
266
- - **Coverage**: Multiple MT systems and domains (education, tourism)
267
-
268
  ## Real-World Impact & Applications
269
 
270
  ### Addressing Rwanda's NLP Ecosystem Needs
 
271
  KinyCOMET directly addresses pain points identified by the Rwandan MT community:
272
 
273
  **Before KinyCOMET:**
274
- - BLEU scores poorly correlate with human judgment for Kinyarwanda
275
- - Expensive, time-consuming human evaluation required
276
- - No reliable automatic metrics for morphologically rich Kinyarwanda
277
 
278
  **With KinyCOMET:**
279
- - **2.5x better correlation** with human judgments than BLEU
280
- - **Instant evaluation** for production MT systems
281
- - **Cost-effective** alternative to human annotation
282
- - **Specialized for Kinyarwanda** morphological complexity
283
 
284
  ### Production Use Cases
 
285
  **For MT Companies** (Digital Umuganda, KINLP, Awesomity, Artemis AI):
286
  - Real-time translation quality monitoring
287
  - A/B testing of model improvements
@@ -299,10 +280,15 @@ KinyCOMET directly addresses pain points identified by the Rwandan MT community:
299
 
300
  ## Limitations & Considerations
301
 
302
- - **Domain Specificity**: Trained on specific text domains; may not generalize to all content types
303
  - **Language Variants**: Optimized for standard Kinyarwanda; dialectal variations may affect performance
304
  - **Resource Requirements**: Requires COMET library and substantial computational resources
305
  - **Score Interpretation**: Scores are relative to training data distribution
 
 
 
 
 
306
 
307
  ## Citation & Research
308
 
@@ -311,7 +297,7 @@ If you use KinyCOMET in your research, please cite:
311
  ```bibtex
312
  @misc{kinycomet2025,
313
  title={KinyCOMET: Translation Quality Estimation for Kinyarwanda-English},
314
- author={[Prince Chris Mazimpaka] and [Jan Nehring]},
315
  year={2025},
316
  publisher={Hugging Face},
317
  howpublished={\url{https://huggingface.co/chrismazii/kinycomet_unbabel}}
@@ -329,14 +315,18 @@ KinyCOMET contributes to the growing ecosystem of African language NLP tools. We
329
 
330
  ## License
331
 
332
- This model is released under the Apache 2.0 License.
333
 
334
  ## Acknowledgments
335
 
336
- - **COMET Framework**: Built on the excellent COMET quality estimation framework
337
  - **Base Models**: Leverages XLM-RoBERTa and Unbabel's WMT22 COMET-DA models
338
  - **African NLP Community**: Inspired by ongoing efforts to advance African language technologies
339
- - **Contributors**: Thanks to all researchers and annotators who made this work possible
340
 
341
  ---
342
 
 
 
 
 
 
74
  | BLEU | N/A | 0.30 | 0.34 | 0.23 | 0.62 |
75
  | chrF | N/A | 0.38 | 0.30 | 0.21 | 0.34 |
76
 
77
+ **State-of-the-Art Results**: Both KinyCOMET variants significantly outperform existing baselines, with KinyCOMET-Unbabel achieving the highest correlation across all metrics.
78
 
79
  ## Performance Highlights
80
 
 
98
  - Both KinyCOMET variants significantly outperform AfriCOMET baselines despite including Kinyarwanda
99
  - Surprising finding: Unbabel baseline (not trained on Kinyarwanda) outperforms AfriCOMET variants
100
 
 
 
101
  ## Installation
102
+
103
+ Make sure you have Python ≥ 3.8 and install COMET via pip:
104
+
105
  ```bash
106
+ pip install unbabel-comet
107
  ```
 
108
 
109
+ You can verify the CLI tool is installed:
110
+
111
+ ```bash
112
+ which comet-score
113
+ # should print something like: /usr/local/bin/comet-score
114
+ ```
115
+
116
+ For more details on COMET, see the [official documentation](https://unbabel.github.io/COMET/html/index.html).
117
+
118
+ ## Usage
119
+
120
+ ### Load and Use the Model in Python
121
+
122
+ Here's a simple example to score translations directly in Python:
123
+
124
+ ```python
125
  from comet import load_from_checkpoint
126
+
127
+ # Load the public KinyCOMET model
128
+ model = load_from_checkpoint("chrismazii/kinycomet_unbabel")
129
+
130
+ # Example translations
131
+ samples = [
132
+ {
133
+ "src": "Umugabo ararya.",
134
+ "mt": "The man is eating.",
135
+ "ref": "The man is eating."
136
+ },
137
+ {
138
+ "src": "Umwana arasinzira.",
139
+ "mt": "A dog sleeps.",
140
+ "ref": "The child is sleeping."
141
+ }
142
+ ]
143
+
144
+ # Predict scores
145
+ pred = model.predict(samples, gpus=0)
146
+ print(pred)
 
 
 
 
 
 
 
 
147
  ```
148
 
149
+ **Output Example:**
150
 
151
  ```python
152
+ Prediction({
153
+ 'scores': [0.9899, 0.8813],
154
+ 'system_score': 0.9356
 
 
 
 
 
 
 
 
 
 
155
  })
156
  ```
157
 
158
+ ### Using the Command Line Interface (CLI)
159
+
160
+ You can also evaluate translations directly using the terminal.
161
+
162
+ **Step 1: Create the text files**
163
+
164
+ ```bash
165
+ cat > source.txt <<'SRC'
166
+ Umugabo ararya.
167
+ Umwana arasinzira.
168
+ Uyu mwanya neza cyane.
169
+ SRC
170
+
171
+ cat > reference.txt <<'REF'
172
+ The man is eating.
173
+ The child is sleeping.
174
+ This place is very nice.
175
+ REF
176
+
177
+ cat > hypothesis.txt <<'HYP'
178
+ The man is eating.
179
+ A dog sleeps.
180
+ This place is very nice.
181
+ HYP
182
+ ```
183
+
184
+ **Step 2: Run KinyCOMET**
185
+
186
+ ```bash
187
+ comet-score -s source.txt -r reference.txt -t hypothesis.txt \
188
+ --model chrismazii/kinycomet_unbabel --gpus 0 --to_json results.json
189
+ ```
190
+
191
+ **Step 3: View the results**
192
+
193
+ ```bash
194
+ cat results.json
195
+ ```
196
 
197
+ **Example Output:**
198
+
199
+ ```json
200
+ {
201
+ "system_score": 0.9547,
202
+ "segments": [
203
+ {"src":"Umugabo ararya.","mt":"The man is eating.","ref":"The man is eating.","score":0.9899},
204
+ {"src":"Umwana arasinzira.","mt":"A dog sleeps.","ref":"The child is sleeping.","score":0.8813},
205
+ {"src":"Uyu mwanya neza cyane.","mt":"This place is very nice.","ref":"This place is very nice.","score":0.9927}
206
+ ]
207
+ }
208
+ ```
209
+
210
+ ### Score Interpretation
211
+
212
+ - **Scores range from 0 to 1**: Higher scores indicate better translation quality
213
+ - **System score**: Average quality across all translations
214
+ - **Segment scores**: Individual quality scores for each translation pair
215
+ - **Threshold guidance**: Scores above 0.8 typically indicate high-quality translations
216
 
217
  ## Training Details
218
 
219
+ ### Model Architecture
220
+ - **Base Models**: XLM-RoBERTa-large and Unbabel/wmt22-comet-da
221
+ - **Framework**: COMET quality estimation framework
222
+ - **Training Data**: 4,323 human-annotated Kinyarwanda-English translation pairs
 
 
 
 
 
 
 
 
 
 
223
 
224
  ### Training Configuration
 
225
  - **Methodology**: COMET framework with Direct Assessment supervision
226
  - **Evaluation Metrics**: Kendall's τ and Spearman ρ correlation with human DA scores
227
+ - **Data Split**: 80% train (3,497) / 10% validation (404) / 10% test (422)
 
 
 
 
 
 
 
 
228
 
229
  ### MT System Benchmarking Results
230
+
231
  Our evaluation of production MT systems reveals interesting insights:
232
 
233
  | MT System | Kinyarwanda→English | English→Kinyarwanda | Overall |
 
244
  - All systems perform better on Kinyarwanda→English than English→Kinyarwanda
245
  - Score differences are subtle but statistically meaningful with KinyCOMET's precision
246
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
247
  ## Real-World Impact & Applications
248
 
249
  ### Addressing Rwanda's NLP Ecosystem Needs
250
+
251
  KinyCOMET directly addresses pain points identified by the Rwandan MT community:
252
 
253
  **Before KinyCOMET:**
254
+ - BLEU scores poorly correlate with human judgment for Kinyarwanda
255
+ - Expensive, time-consuming human evaluation required
256
+ - No reliable automatic metrics for morphologically rich Kinyarwanda
257
 
258
  **With KinyCOMET:**
259
+ - **2.5x better correlation** with human judgments than BLEU
260
+ - **Instant evaluation** for production MT systems
261
+ - **Cost-effective** alternative to human annotation
262
+ - **Specialized for Kinyarwanda** morphological complexity
263
 
264
  ### Production Use Cases
265
+
266
  **For MT Companies** (Digital Umuganda, KINLP, Awesomity, Artemis AI):
267
  - Real-time translation quality monitoring
268
  - A/B testing of model improvements
 
280
 
281
  ## Limitations & Considerations
282
 
283
+ - **Domain Specificity**: Trained on education and tourism domains; may not generalize to all content types
284
  - **Language Variants**: Optimized for standard Kinyarwanda; dialectal variations may affect performance
285
  - **Resource Requirements**: Requires COMET library and substantial computational resources
286
  - **Score Interpretation**: Scores are relative to training data distribution
287
+ - **Reference Dependency**: Best performance achieved with reference translations
288
+
289
+ ## Dataset Access
290
+
291
+ The training dataset is available separately. See the [KinyCOMET Dataset Card](https://huggingface.co/datasets/chrismazii/kinycomet_dataset) for details on accessing the human-annotated quality estimation data.
292
 
293
  ## Citation & Research
294
 
 
297
  ```bibtex
298
  @misc{kinycomet2025,
299
  title={KinyCOMET: Translation Quality Estimation for Kinyarwanda-English},
300
+ author={Prince Chris Mazimpaka and Jan Nehring},
301
  year={2025},
302
  publisher={Hugging Face},
303
  howpublished={\url{https://huggingface.co/chrismazii/kinycomet_unbabel}}
 
315
 
316
  ## License
317
 
318
+ This model is released under the Apache 2.0 License.
319
 
320
  ## Acknowledgments
321
 
322
+ - **COMET Framework**: Built on the excellent [COMET quality estimation framework](https://unbabel.github.io/COMET/html/index.html)
323
  - **Base Models**: Leverages XLM-RoBERTa and Unbabel's WMT22 COMET-DA models
324
  - **African NLP Community**: Inspired by ongoing efforts to advance African language technologies
325
+ - **Contributors**: Thanks to the 15 linguistics students and all researchers who made this work possible
326
 
327
  ---
328
 
329
+ **Resources:**
330
+ - [COMET Documentation](https://unbabel.github.io/COMET/html/index.html)
331
+ - [Dataset Card](https://huggingface.co/datasets/chrismazii/kinycomet_dataset)
332
+ - [Model Files](https://huggingface.co/chrismazii/kinycomet_unbabel/tree/main)