Tsunnami commited on
Commit
a4d6c14
·
verified ·
1 Parent(s): d07141b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -21,7 +21,7 @@ base_model: Unbabel/wmt22-cometkiwi-da
21
  pipeline_tag: translation
22
  library_name: unbabel-comet
23
  model-index:
24
- - name: ComETH-Augmented
25
  results:
26
  - task:
27
  type: translation-quality-estimation
@@ -52,7 +52,7 @@ model-index:
52
  ---
53
  # ComeTH: Thai-English Translation Quality Metrics
54
 
55
- ComETH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for Evaluation of Translation) model specifically optimized for Thai-English translation quality assessment. This model evaluates machine translation outputs by providing quality scores that correlate highly with human judgments.
56
 
57
  ## Model Overview
58
 
@@ -67,10 +67,10 @@ ComETH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for E
67
 
68
  ## Versions
69
 
70
- We offer two variants of ComETH with different training approaches:
71
 
72
- - **ComETH**: Fine-tuned on human MQM annotations (Spearman's ρ = 0.4639)
73
- - **ComETH-Augmented**: Fine-tuned on human + Claude-assisted annotations (Spearman's ρ = 0.4795)
74
 
75
  Both models outperform the base COMET model (Spearman's ρ = 0.4570) on Thai-English translation evaluation. The Claude-augmented version leverages LLM-generated annotations to enhance correlation with human judgments by 4.9% over the baseline.
76
 
@@ -101,7 +101,7 @@ The models were trained on:
101
 
102
  ## Training Process
103
 
104
- ComETH was trained using a multi-step process:
105
  1. Starting from the wmt22-cometkiwi-da checkpoint
106
  2. Fine-tuning on human MQM annotations for 5 epochs
107
  3. Using gradient accumulation (8 steps) to simulate larger batch sizes
@@ -115,8 +115,8 @@ ComETH was trained using a multi-step process:
115
  | Model | Spearman's ρ | RMSE |
116
  |-------|-------------|------|
117
  | COMET (baseline) | 0.4570 | 0.3185 |
118
- | ComETH (human annotations) | 0.4639 | 0.3093 |
119
- | ComETH-Augmented (human + Claude) | **0.4795** | **0.3078** |
120
 
121
  The Claude-augmented version demonstrates the highest correlation with human judgments, offering a significant improvement over both the baseline and human-only models.
122
 
@@ -124,12 +124,12 @@ The Claude-augmented version demonstrates the highest correlation with human jud
124
 
125
  | Model | Spearman's ρ |
126
  |-------|-------------|
127
- | ComETH-Augmented | **0.4795** |
128
  | Claude 3.5 Sonnet | 0.4383 |
129
  | GPT-4o Mini | 0.4352 |
130
  | Gemini 2.0 Flash | 0.3918 |
131
 
132
- ComETH-Augmented outperforms direct evaluations from state-of-the-art LLMs, while being more computationally efficient for large-scale translation quality assessments.
133
 
134
  ## Advanced Usage Examples
135
 
@@ -139,7 +139,7 @@ ComETH-Augmented outperforms direct evaluations from state-of-the-art LLMs, whil
139
  from comet import download_model, load_from_checkpoint
140
 
141
  # Load the model
142
- model = load_from_checkpoint("cometh-team/ComETH-Augmented")
143
 
144
  # Prepare input data
145
  translations = [
 
21
  pipeline_tag: translation
22
  library_name: unbabel-comet
23
  model-index:
24
+ - name: ComeTH
25
  results:
26
  - task:
27
  type: translation-quality-estimation
 
52
  ---
53
  # ComeTH: Thai-English Translation Quality Metrics
54
 
55
+ ComeTH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for Evaluation of Translation) model specifically optimized for Thai-English translation quality assessment. This model evaluates machine translation outputs by providing quality scores that correlate highly with human judgments.
56
 
57
  ## Model Overview
58
 
 
67
 
68
  ## Versions
69
 
70
+ We offer two variants of ComeTH with different training approaches:
71
 
72
+ - **ComeTH**: Fine-tuned on human MQM annotations (Spearman's ρ = 0.4639)
73
+ - **ComeTH-Augmented**: Fine-tuned on human + Claude-assisted annotations (Spearman's ρ = 0.4795)
74
 
75
  Both models outperform the base COMET model (Spearman's ρ = 0.4570) on Thai-English translation evaluation. The Claude-augmented version leverages LLM-generated annotations to enhance correlation with human judgments by 4.9% over the baseline.
76
 
 
101
 
102
  ## Training Process
103
 
104
+ ComeTH was trained using a multi-step process:
105
  1. Starting from the wmt22-cometkiwi-da checkpoint
106
  2. Fine-tuning on human MQM annotations for 5 epochs
107
  3. Using gradient accumulation (8 steps) to simulate larger batch sizes
 
115
  | Model | Spearman's ρ | RMSE |
116
  |-------|-------------|------|
117
  | COMET (baseline) | 0.4570 | 0.3185 |
118
+ | ComeTH (human annotations) | 0.4639 | 0.3093 |
119
+ | ComeTH-Augmented (human + Claude) | **0.4795** | **0.3078** |
120
 
121
  The Claude-augmented version demonstrates the highest correlation with human judgments, offering a significant improvement over both the baseline and human-only models.
122
 
 
124
 
125
  | Model | Spearman's ρ |
126
  |-------|-------------|
127
+ | ComeTH-Augmented | **0.4795** |
128
  | Claude 3.5 Sonnet | 0.4383 |
129
  | GPT-4o Mini | 0.4352 |
130
  | Gemini 2.0 Flash | 0.3918 |
131
 
132
+ ComeTH-Augmented outperforms direct evaluations from state-of-the-art LLMs, while being more computationally efficient for large-scale translation quality assessments.
133
 
134
  ## Advanced Usage Examples
135
 
 
139
  from comet import download_model, load_from_checkpoint
140
 
141
  # Load the model
142
+ model = load_from_checkpoint("wasanx/ComeTH")
143
 
144
  # Prepare input data
145
  translations = [