Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,7 @@ base_model: Unbabel/wmt22-cometkiwi-da
|
|
| 21 |
pipeline_tag: translation
|
| 22 |
library_name: unbabel-comet
|
| 23 |
model-index:
|
| 24 |
-
- name:
|
| 25 |
results:
|
| 26 |
- task:
|
| 27 |
type: translation-quality-estimation
|
|
@@ -52,7 +52,7 @@ model-index:
|
|
| 52 |
---
|
| 53 |
# ComeTH: Thai-English Translation Quality Metrics
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
## Model Overview
|
| 58 |
|
|
@@ -67,10 +67,10 @@ ComETH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for E
|
|
| 67 |
|
| 68 |
## Versions
|
| 69 |
|
| 70 |
-
We offer two variants of
|
| 71 |
|
| 72 |
-
- **
|
| 73 |
-
- **
|
| 74 |
|
| 75 |
Both models outperform the base COMET model (Spearman's ρ = 0.4570) on Thai-English translation evaluation. The Claude-augmented version leverages LLM-generated annotations to enhance correlation with human judgments by 4.9% over the baseline.
|
| 76 |
|
|
@@ -101,7 +101,7 @@ The models were trained on:
|
|
| 101 |
|
| 102 |
## Training Process
|
| 103 |
|
| 104 |
-
|
| 105 |
1. Starting from the wmt22-cometkiwi-da checkpoint
|
| 106 |
2. Fine-tuning on human MQM annotations for 5 epochs
|
| 107 |
3. Using gradient accumulation (8 steps) to simulate larger batch sizes
|
|
@@ -115,8 +115,8 @@ ComETH was trained using a multi-step process:
|
|
| 115 |
| Model | Spearman's ρ | RMSE |
|
| 116 |
|-------|-------------|------|
|
| 117 |
| COMET (baseline) | 0.4570 | 0.3185 |
|
| 118 |
-
|
|
| 119 |
-
|
|
| 120 |
|
| 121 |
The Claude-augmented version demonstrates the highest correlation with human judgments, offering a significant improvement over both the baseline and human-only models.
|
| 122 |
|
|
@@ -124,12 +124,12 @@ The Claude-augmented version demonstrates the highest correlation with human jud
|
|
| 124 |
|
| 125 |
| Model | Spearman's ρ |
|
| 126 |
|-------|-------------|
|
| 127 |
-
|
|
| 128 |
| Claude 3.5 Sonnet | 0.4383 |
|
| 129 |
| GPT-4o Mini | 0.4352 |
|
| 130 |
| Gemini 2.0 Flash | 0.3918 |
|
| 131 |
|
| 132 |
-
|
| 133 |
|
| 134 |
## Advanced Usage Examples
|
| 135 |
|
|
@@ -139,7 +139,7 @@ ComETH-Augmented outperforms direct evaluations from state-of-the-art LLMs, whil
|
|
| 139 |
from comet import download_model, load_from_checkpoint
|
| 140 |
|
| 141 |
# Load the model
|
| 142 |
-
model = load_from_checkpoint("
|
| 143 |
|
| 144 |
# Prepare input data
|
| 145 |
translations = [
|
|
|
|
| 21 |
pipeline_tag: translation
|
| 22 |
library_name: unbabel-comet
|
| 23 |
model-index:
|
| 24 |
+
- name: ComeTH
|
| 25 |
results:
|
| 26 |
- task:
|
| 27 |
type: translation-quality-estimation
|
|
|
|
| 52 |
---
|
| 53 |
# ComeTH: Thai-English Translation Quality Metrics
|
| 54 |
|
| 55 |
+
ComeTH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for Evaluation of Translation) model specifically optimized for Thai-English translation quality assessment. This model evaluates machine translation outputs by providing quality scores that correlate highly with human judgments.
|
| 56 |
|
| 57 |
## Model Overview
|
| 58 |
|
|
|
|
| 67 |
|
| 68 |
## Versions
|
| 69 |
|
| 70 |
+
We offer two variants of ComeTH with different training approaches:
|
| 71 |
|
| 72 |
+
- **ComeTH**: Fine-tuned on human MQM annotations (Spearman's ρ = 0.4639)
|
| 73 |
+
- **ComeTH-Augmented**: Fine-tuned on human + Claude-assisted annotations (Spearman's ρ = 0.4795)
|
| 74 |
|
| 75 |
Both models outperform the base COMET model (Spearman's ρ = 0.4570) on Thai-English translation evaluation. The Claude-augmented version leverages LLM-generated annotations to enhance correlation with human judgments by 4.9% over the baseline.
|
| 76 |
|
|
|
|
| 101 |
|
| 102 |
## Training Process
|
| 103 |
|
| 104 |
+
ComeTH was trained using a multi-step process:
|
| 105 |
1. Starting from the wmt22-cometkiwi-da checkpoint
|
| 106 |
2. Fine-tuning on human MQM annotations for 5 epochs
|
| 107 |
3. Using gradient accumulation (8 steps) to simulate larger batch sizes
|
|
|
|
| 115 |
| Model | Spearman's ρ | RMSE |
|
| 116 |
|-------|-------------|------|
|
| 117 |
| COMET (baseline) | 0.4570 | 0.3185 |
|
| 118 |
+
| ComeTH (human annotations) | 0.4639 | 0.3093 |
|
| 119 |
+
| ComeTH-Augmented (human + Claude) | **0.4795** | **0.3078** |
|
| 120 |
|
| 121 |
The Claude-augmented version demonstrates the highest correlation with human judgments, offering a significant improvement over both the baseline and human-only models.
|
| 122 |
|
|
|
|
| 124 |
|
| 125 |
| Model | Spearman's ρ |
|
| 126 |
|-------|-------------|
|
| 127 |
+
| ComeTH-Augmented | **0.4795** |
|
| 128 |
| Claude 3.5 Sonnet | 0.4383 |
|
| 129 |
| GPT-4o Mini | 0.4352 |
|
| 130 |
| Gemini 2.0 Flash | 0.3918 |
|
| 131 |
|
| 132 |
+
ComeTH-Augmented outperforms direct evaluations from state-of-the-art LLMs, while being more computationally efficient for large-scale translation quality assessments.
|
| 133 |
|
| 134 |
## Advanced Usage Examples
|
| 135 |
|
|
|
|
| 139 |
from comet import download_model, load_from_checkpoint
|
| 140 |
|
| 141 |
# Load the model
|
| 142 |
+
model = load_from_checkpoint("wasanx/ComeTH")
|
| 143 |
|
| 144 |
# Prepare input data
|
| 145 |
translations = [
|