Darsala commited on
Commit
76a269c
ยท
verified ยท
1 Parent(s): f440f59

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +279 -5
README.md CHANGED
@@ -1,8 +1,282 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - ka
5
- base_model:
6
- - Unbabel/wmt22-comet-da
7
- pipeline_tag: translation
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - ka
4
+ - en
5
+ license: apache-2.0
6
+ tags:
7
+ - translation
8
+ - evaluation
9
+ - comet
10
+ - mt-evaluation
11
+ - georgian
12
+ metrics:
13
+ - kendall_tau
14
+ - spearman_correlation
15
+ - pearson_correlation
16
+ model-index:
17
+ - name: Georgian-COMET
18
+ results:
19
+ - task:
20
+ type: translation-evaluation
21
+ name: Machine Translation Evaluation
22
+ dataset:
23
+ name: Georgian MT Evaluation Dataset
24
+ type: Darsala/georgian_metric_evaluation
25
+ metrics:
26
+ - type: pearson_correlation
27
+ value: 0.878
28
+ name: Pearson Correlation
29
+ - type: spearman_correlation
30
+ value: 0.796
31
+ name: Spearman Correlation
32
+ - type: kendall_tau
33
+ value: 0.603
34
+ name: Kendall's Tau
35
+ base_model: Unbabel/wmt22-comet-da
36
+ datasets:
37
+ - Darsala/georgian_metric_evaluation
38
+ ---
39
+
40
+ # Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation
41
+
42
+ This is a [COMET](https://github.com/Unbabel/COMET) evaluation model fine-tuned specifically for English-Georgian machine translation evaluation. It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.
43
+
44
+ ## Model Description
45
+
46
+ Georgian-COMET is a fine-tuned version of [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) that has been optimized for evaluating English-to-Georgian translations through knowledge distillation from Claude Sonnet 4. The model shows significant improvements over the base model when evaluating Georgian translations.
47
+
48
+ ### Key Improvements over Base Model
49
+
50
+ | Metric | Base COMET | Georgian-COMET | Improvement |
51
+ |--------|------------|----------------|-------------|
52
+ | Pearson | 0.867 | **0.878** | +1.1% |
53
+ | Spearman | 0.759 | **0.796** | +3.7% |
54
+ | Kendall | 0.564 | **0.603** | +3.9% |
55
+
56
+ ## Paper
57
+
58
+ - **Base Model Paper**: [COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task](https://aclanthology.org/2022.wmt-1.52) (Rei et al., WMT 2022)
59
+ - **This Model**: Paper coming soon
60
+
61
+ ## Repository
62
+
63
+ [https://github.com/LukaDarsalia/nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research)
64
+
65
+ ## License
66
+
67
+ Apache-2.0
68
+
69
+ ## Usage (unbabel-comet)
70
+
71
+ Using this model requires unbabel-comet to be installed:
72
+
73
+ ```bash
74
+ pip install --upgrade pip # ensures that pip is current
75
+ pip install unbabel-comet
76
+ ```
77
+
78
+ ### Option 1: Direct Download from HuggingFace
79
+
80
+ ```python
81
+ from comet import load_from_checkpoint
82
+ import requests
83
+ import os
84
+
85
+ # Download the model checkpoint
86
+ model_url = "https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt"
87
+ model_path = "georgian_comet.ckpt"
88
+
89
+ # Download if not already present
90
+ if not os.path.exists(model_path):
91
+ response = requests.get(model_url)
92
+ with open(model_path, 'wb') as f:
93
+ f.write(response.content)
94
+
95
+ # Load the model
96
+ model = load_from_checkpoint(model_path)
97
+
98
+ # Prepare your data
99
+ data = [
100
+ {
101
+ "src": "The cat sat on the mat.",
102
+ "mt": "แƒ™แƒแƒขแƒ แƒ–แƒ˜แƒก แƒฎแƒแƒšแƒ˜แƒฉแƒแƒ–แƒ”.",
103
+ "ref": "แƒ™แƒแƒขแƒ แƒ˜แƒฏแƒ“แƒ แƒฎแƒแƒšแƒ˜แƒฉแƒแƒ–แƒ”."
104
+ },
105
+ {
106
+ "src": "Schools and kindergartens were opened.",
107
+ "mt": "แƒกแƒ™แƒแƒšแƒ”แƒ‘แƒ˜ แƒ“แƒ แƒกแƒแƒ‘แƒแƒ•แƒจแƒ•แƒ แƒ‘แƒแƒฆแƒ”แƒ‘แƒ˜ แƒ’แƒแƒ˜แƒฎแƒกแƒœแƒ.",
108
+ "ref": "แƒ’แƒแƒ˜แƒฎแƒกแƒœแƒ แƒกแƒ™แƒแƒšแƒ”แƒ‘แƒ˜ แƒ“แƒ แƒกแƒแƒ‘แƒแƒ•แƒจแƒ•แƒ แƒ‘แƒแƒฆแƒ”แƒ‘แƒ˜."
109
+ }
110
+ ]
111
+
112
+ # Get predictions
113
+ model_output = model.predict(data, batch_size=8, gpus=1)
114
+ print(model_output)
115
+ ```
116
+
117
+ ### Option 2: Using comet CLI
118
+
119
+ First download the model checkpoint:
120
+ ```bash
121
+ wget https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt -O georgian_comet.ckpt
122
+ ```
123
+
124
+ Then use it with comet CLI:
125
+ ```bash
126
+ comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model georgian_comet.ckpt
127
+ ```
128
+
129
+ ### Option 3: Integration with Evaluation Pipeline
130
+
131
+ ```python
132
+ from comet import load_from_checkpoint
133
+ import pandas as pd
134
+
135
+ # Load model
136
+ model = load_from_checkpoint("georgian_comet.ckpt")
137
+
138
+ # Load your evaluation data
139
+ df = pd.read_csv("your_evaluation_data.csv")
140
+
141
+ # Prepare data in COMET format
142
+ data = [
143
+ {
144
+ "src": row["sourceText"],
145
+ "mt": row["targetText"],
146
+ "ref": row["referenceText"]
147
+ }
148
+ for _, row in df.iterrows()
149
+ ]
150
+
151
+ # Get scores
152
+ scores = model.predict(data, batch_size=16)
153
+ print(f"Average score: {sum(scores['scores']) / len(scores['scores']):.3f}")
154
+ ```
155
+
156
+ ## Intended Uses
157
+
158
+ This model is intended to be used for **English-Georgian MT evaluation**.
159
+
160
+ Given a triplet with (source sentence in English, translation in Georgian, reference translation in Georgian), it outputs a single score between 0 and 1 where 1 represents a perfect translation.
161
+
162
+ ### Primary Use Cases
163
+
164
+ 1. **MT System Development**: Evaluate and compare different English-Georgian MT systems
165
+ 2. **Quality Assurance**: Automated quality checks for Georgian translations
166
+ 3. **Research**: Study MT evaluation for morphologically rich languages like Georgian
167
+ 4. **Production Monitoring**: Track translation quality in production environments
168
+
169
+ ### Out-of-Scope Use
170
+
171
+ - **Other Language Pairs**: This model is specifically fine-tuned for English-Georgian and may not perform well on other language pairs
172
+ - **Reference-Free Evaluation**: The model requires reference translations
173
+ - **Document-Level**: Optimized for sentence-level evaluation
174
+
175
+ ## Training Details
176
+
177
+ ### Training Data
178
+
179
+ - **Dataset**: 5,000 English-Georgian pairs from [corp.dict.ge](https://corp.dict.ge/)
180
+ - **MT Systems**: Translations from SMaLL-100, Google Translate, and Ucraft Translate
181
+ - **Scoring Method**: Knowledge distillation from Claude Sonnet 4 with added Gaussian noise (ฯƒ=3)
182
+ - **Details**: See [Darsala/georgian_metric_evaluation](https://huggingface.co/datasets/Darsala/georgian_metric_evaluation)
183
+
184
+ ### Training Configuration
185
+
186
+ ```yaml
187
+ regression_metric:
188
+ init_args:
189
+ nr_frozen_epochs: 0.3
190
+ keep_embeddings_frozen: True
191
+ optimizer: AdamW
192
+ encoder_learning_rate: 1.5e-05
193
+ learning_rate: 1.5e-05
194
+ loss: mse
195
+ dropout: 0.1
196
+ batch_size: 8
197
+ ```
198
+
199
+ ### Training Procedure
200
+
201
+ 1. **Base Model**: Started from Unbabel/wmt22-comet-da checkpoint
202
+ 2. **Knowledge Distillation**: Used Claude Sonnet 4 scores as training targets
203
+ 3. **Robustness**: Added Gaussian noise to training scores to prevent overfitting
204
+ 4. **Optimization**: 8 epochs with early stopping (patience=4) on validation Kendall's tau
205
+
206
+ ## Evaluation Results
207
+
208
+ ### Test Set Performance
209
+
210
+ Evaluated on 400 human-annotated English-Georgian translation pairs:
211
+
212
+ | Metric | Score | p-value |
213
+ |--------|-------|---------|
214
+ | Pearson | 0.878 | < 0.001 |
215
+ | Spearman | 0.796 | < 0.001 |
216
+ | Kendall | 0.603 | < 0.001 |
217
+
218
+ ### Comparison with Other Metrics
219
+
220
+ | Metric | Pearson | Spearman | Kendall |
221
+ |--------|---------|----------|---------|
222
+ | **Georgian-COMET** | **0.878** | 0.796 | 0.603 |
223
+ | Base COMET | 0.867 | 0.759 | 0.564 |
224
+ | LLM-Reference-Based | 0.852 | **0.798** | **0.660** |
225
+ | CHRF++ | 0.739 | 0.690 | 0.498 |
226
+ | TER | 0.466 | 0.443 | 0.311 |
227
+ | BLEU | 0.413 | 0.497 | 0.344 |
228
+
229
+ ## Languages Covered
230
+
231
+ While the base model (XLM-R) covers 100+ languages, this fine-tuned version is specifically optimized for:
232
+ - **Source Language**: English (en)
233
+ - **Target Language**: Georgian (ka)
234
+
235
+ For other language pairs, we recommend using the base [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) model.
236
+
237
+ ## Limitations
238
+
239
+ 1. **Language Specific**: Optimized only for Englishโ†’Georgian evaluation
240
+ 2. **Domain**: Training data primarily from corp.dict.ge (general/literary domain)
241
+ 3. **Reference Required**: Cannot perform reference-free evaluation
242
+ 4. **Sentence Level**: Not optimized for document-level evaluation
243
+
244
+ ## Citation
245
+
246
+ If you use this model, please cite:
247
+
248
+ ```bibtex
249
+ @misc{georgian-comet-2025,
250
+ title={Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation},
251
+ author={Luka Darsalia, Ketevan Bakhturidze, Saba Sturua},
252
+ year={2025},
253
+ publisher={HuggingFace},
254
+ url={https://huggingface.co/Darsala/georgian_comet}
255
+ }
256
+
257
+ @inproceedings{rei-etal-2022-comet,
258
+ title = "{COMET}-22: Unbabel-{IST} 2022 Submission for the Metrics Shared Task",
259
+ author = "Rei, Ricardo and
260
+ C. de Souza, Jos{\'e} G. and
261
+ Alves, Duarte and
262
+ Zerva, Chrysoula and
263
+ Farinha, Ana C and
264
+ Glushkova, Taisiya and
265
+ Lavie, Alon and
266
+ Coheur, Luisa and
267
+ Martins, Andr{\'e} F. T.",
268
+ booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
269
+ year = "2022",
270
+ address = "Abu Dhabi, United Arab Emirates",
271
+ publisher = "Association for Computational Linguistics",
272
+ url = "https://aclanthology.org/2022.wmt-1.52",
273
+ pages = "578--585",
274
+ }
275
+ ```
276
+
277
+ ## Acknowledgments
278
+
279
+ - [Unbabel](https://unbabel.com/) team for the base COMET model
280
+ - [Anthropic](https://anthropic.com/) for Claude Sonnet 4 used in knowledge distillation
281
+ - [corp.dict.ge](https://corp.dict.ge/) for the Georgian-English corpus
282
+ - All contributors to the [nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research) project