Tsunnami commited on
Commit
759a2ec
·
verified ·
1 Parent(s): e86f9ce

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +237 -0
README.md ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: cometh-reserved
4
+ datasets:
5
+ - wasanx/cometh_human_annot
6
+ language:
7
+ - en
8
+ - th
9
+ metrics:
10
+ - spearman correlation
11
+ tags:
12
+ - translation-evaluation
13
+ - thai
14
+ - english
15
+ - translation-metrics
16
+ - mqm
17
+ - claude-augmented
18
+ - comet
19
+ - translation-quality
20
+ base_model: Unbabel/wmt22-cometkiwi-da
21
+ pipeline_tag: translation
22
+ library_name: unbabel-comet
23
+ model-index:
24
+ - name: ComETH-Augmented
25
+ results:
26
+ - task:
27
+ type: translation-quality-estimation
28
+ name: Thai-English Translation Quality Assessment
29
+ dataset:
30
+ type: wasanx/cometh_human_annot
31
+ name: COMETH Human Annotations
32
+ metrics:
33
+ - name: Spearman correlation
34
+ type: spearman
35
+ value: 0.4795
36
+ verified: false
37
+ - task:
38
+ type: translation-quality-estimation
39
+ name: Thai-English Translation Quality Comparison
40
+ dataset:
41
+ type: wasanx/cometh_human_annot
42
+ name: COMETH Baseline Comparison
43
+ metrics:
44
+ - name: COMET baseline
45
+ type: spearman
46
+ value: 0.4570
47
+ verified: false
48
+ - name: ComETH (human-only)
49
+ type: spearman
50
+ value: 0.4639
51
+ verified: false
52
+ ---
53
+ # ComeTH: Thai-English Translation Quality Metrics
54
+
55
+ ComETH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for Evaluation of Translation) model specifically optimized for Thai-English translation quality assessment. This model evaluates machine translation outputs by providing quality scores that correlate highly with human judgments.
56
+
57
+ ## Model Overview
58
+
59
+ - **Model Type**: Translation Quality Estimation
60
+ - **Languages**: Thai-English
61
+ - **Base Model**: COMET (Unbabel/wmt22-cometkiwi-da)
62
+ - **Encoder**: XLM-RoBERTa-based (microsoft/infoxlm-large)
63
+ - **Architecture**: Unified Metric with sentence-level scoring
64
+ - **Framework**: COMET (Unbabel)
65
+ - **Task**: Machine Translation Evaluation
66
+ - **Parameters**: 565M (558M encoder + 6.3M estimator)
67
+
68
+ ## Versions
69
+
70
+ We offer two variants of ComETH with different training approaches:
71
+
72
+ - **ComETH**: Fine-tuned on human MQM annotations (Spearman's ρ = 0.4639)
73
+ - **ComETH-Augmented**: Fine-tuned on human + Claude-assisted annotations (Spearman's ρ = 0.4795)
74
+
75
+ Both models outperform the base COMET model (Spearman's ρ = 0.4570) on Thai-English translation evaluation. The Claude-augmented version leverages LLM-generated annotations to enhance correlation with human judgments by 4.9% over the baseline.
76
+
77
+ ## Technical Specifications
78
+
79
+ - **Training Framework**: PyTorch Lightning
80
+ - **Loss Function**: MSE
81
+ - **Input Segments**: [mt, src]
82
+ - **Final Layer Architecture**: [3072, 1024]
83
+ - **Layer Transformation**: Sparsemax
84
+ - **Activation Function**: Tanh
85
+ - **Dropout**: 0.1
86
+ - **Learning Rate**: 1.5e-05 (Encoder: 1e-06)
87
+ - **Layerwise Decay**: 0.95
88
+ - **Word Layer**: 24
89
+
90
+ ## Training Data
91
+
92
+ The models were trained on:
93
+ - **Size**: 23,530 English-Thai translation pairs
94
+ - **Source Domains**: Diverse, including technical, conversational, and e-commerce
95
+ - **Annotation Framework**: Multidimensional Quality Metrics (MQM)
96
+ - **Error Categories**:
97
+ - Minor: Issues that don't significantly impact meaning or usability
98
+ - Major: Errors that significantly impact meaning but don't render content unusable
99
+ - Critical: Errors that make content unusable or could have serious consequences
100
+ - **Claude Augmentation**: Claude 3.5 Sonnet was used to generate supplementary quality judgments, enhancing the model's alignment with human evaluations
101
+
102
+ ## Training Process
103
+
104
+ ComETH was trained using a multi-step process:
105
+ 1. Starting from the wmt22-cometkiwi-da checkpoint
106
+ 2. Fine-tuning on human MQM annotations for 5 epochs
107
+ 3. Using gradient accumulation (8 steps) to simulate larger batch sizes
108
+ 4. Utilizing unified metric architecture that combines source and MT embeddings
109
+ 5. For the augmented variant: additional training with Claude-assisted annotations, weighted to balance human and machine judgments
110
+
111
+ ## Performance
112
+
113
+ ### Correlation with Human Judgments (Spearman's ρ)
114
+
115
+ | Model | Spearman's ρ | RMSE |
116
+ |-------|-------------|------|
117
+ | COMET (baseline) | 0.4570 | 0.3185 |
118
+ | ComETH (human annotations) | 0.4639 | 0.3093 |
119
+ | ComETH-Augmented (human + Claude) | **0.4795** | **0.3078** |
120
+
121
+ The Claude-augmented version demonstrates the highest correlation with human judgments, offering a significant improvement over both the baseline and human-only models.
122
+
123
+ ### Comparison with Other LLM Evaluators
124
+
125
+ | Model | Spearman's ρ |
126
+ |-------|-------------|
127
+ | ComETH-Augmented | **0.4795** |
128
+ | Claude 3.5 Sonnet | 0.4383 |
129
+ | GPT-4o Mini | 0.4352 |
130
+ | Gemini 2.0 Flash | 0.3918 |
131
+
132
+ ComETH-Augmented outperforms direct evaluations from state-of-the-art LLMs, while being more computationally efficient for large-scale translation quality assessments.
133
+
134
+ ## Advanced Usage Examples
135
+
136
+ ### Basic Evaluation
137
+
138
+ ```python
139
+ from comet import download_model, load_from_checkpoint
140
+
141
+ # Load the model
142
+ model = load_from_checkpoint("cometh-team/ComETH-Augmented")
143
+
144
+ # Prepare input data
145
+ translations = [
146
+ {
147
+ "src": "This is an English source text.",
148
+ "mt": "นี่คือข้อความภาษาอังกฤษ", # Machine translation to evaluate
149
+ }
150
+ ]
151
+
152
+ # Get quality scores
153
+ results = model.predict(translations, batch_size=8, gpus=1)
154
+ scores = results['scores']
155
+ ```
156
+
157
+ ### Batch Processing With Progress Tracking
158
+
159
+ ```python
160
+ import pandas as pd
161
+ from tqdm import tqdm
162
+
163
+ # Load translations from CSV
164
+ df = pd.read_csv("translations.csv")
165
+ input_data = df[['src', 'mt']].to_dict('records')
166
+
167
+ # Process in batches
168
+ batch_size = 32
169
+ all_scores = []
170
+
171
+ for i in tqdm(range(0, len(input_data), batch_size)):
172
+ batch = input_data[i:i+batch_size]
173
+ results = model.predict(batch, batch_size=len(batch), gpus=1)
174
+ all_scores.extend(results['scores'])
175
+
176
+ # Add scores back to dataframe
177
+ df['quality_score'] = all_scores
178
+ ```
179
+
180
+ ### System-Level Evaluation
181
+
182
+ ```python
183
+ import numpy as np
184
+
185
+ # Group by system and compute average scores
186
+ systems = df.groupby('system_name')['quality_score'].agg(['mean', 'std', 'count']).reset_index()
187
+
188
+ # Rank systems by average quality
189
+ systems = systems.sort_values('mean', ascending=False)
190
+ print(systems)
191
+ ```
192
+
193
+ ## License
194
+
195
+ ```
196
+ The COMETH Reserved License
197
+
198
+ Cometh English-to-Thai Translation Data and Model License
199
+
200
+ Copyright (C) Cometh Team. All rights reserved.
201
+
202
+ This license governs the use of the Cometh English-to-Thai translation data and model ("Cometh Model Data"), including but not limited to MQM scores, human translations, and human rankings from various translation sources.
203
+
204
+ Permitted Use
205
+ The Cometh Model Data is licensed exclusively for internal use by the designated Cometh team.
206
+
207
+ Prohibited Use
208
+ The following uses are strictly prohibited:
209
+ 1. Any usage outside the designated purposes unanimously approved by the Cometh team.
210
+ 2. Redistribution, sharing, or distribution of the Cometh Model Data in any form.
211
+ 3. Citation or public reference to the Cometh Model Data in any academic, commercial, or non-commercial context.
212
+ 4. Any use beyond the internal operations of the Cometh team.
213
+
214
+ Legal Enforcement
215
+ Unauthorized use, distribution, or citation of the Cometh Model Data constitutes a violation of this license and may result in legal action, including but not limited to prosecution under applicable laws.
216
+
217
+ Reservation of Rights
218
+ All rights to the Cometh Model Data are reserved by the Cometh team. This license does not transfer any ownership rights.
219
+
220
+ By accessing or using the Cometh Model Data, you agree to be bound by the terms of this license.
221
+ ```
222
+
223
+ ## Citation
224
+
225
+ ```
226
+ @misc{
227
+ title = {COMETH: Thai-English Translation Quality Metrics},
228
+ author = {COMETH Team},
229
+ year = {2025},
230
+ howpublished = {Hugging Face Model Repository},
231
+ url = {https://huggingface.co/wasanx/ComeTH}
232
+ }
233
+ ```
234
+
235
+ ## Contact
236
+
237
+ For questions or feedback: comethteam@gmail.com