|
|
--- |
|
|
license: other |
|
|
license_name: cometh-reserved |
|
|
datasets: |
|
|
- wasanx/cometh_claude_augment |
|
|
- wasanx/cometh_finetune |
|
|
language: |
|
|
- en |
|
|
- th |
|
|
metrics: |
|
|
- spearman correlation |
|
|
tags: |
|
|
- translation-evaluation |
|
|
- thai |
|
|
- english |
|
|
- translation-metrics |
|
|
- mqm |
|
|
- claude-augmented |
|
|
- comet |
|
|
- translation-quality |
|
|
base_model: Unbabel/wmt22-cometkiwi-da |
|
|
pipeline_tag: translation |
|
|
library_name: unbabel-comet |
|
|
model-index: |
|
|
- name: ComeTH |
|
|
results: |
|
|
- task: |
|
|
type: translation-quality-estimation |
|
|
name: English-Thai Translation Quality Assessment |
|
|
dataset: |
|
|
type: wasanx/cometh_claude_augment |
|
|
name: COMETH Claude Augmentation Datasets |
|
|
metrics: |
|
|
- name: Spearman correlation |
|
|
type: spearman |
|
|
value: 0.4795 |
|
|
verified: false |
|
|
- task: |
|
|
type: translation-quality-estimation |
|
|
name: English-Thai Translation Quality Comparison |
|
|
dataset: |
|
|
type: wasanx/cometh_human_annot |
|
|
name: COMETH Baseline Comparison |
|
|
metrics: |
|
|
- name: COMET baseline |
|
|
type: spearman |
|
|
value: 0.4570 |
|
|
verified: false |
|
|
- name: ComETH (human-only) |
|
|
type: spearman |
|
|
value: 0.4639 |
|
|
verified: false |
|
|
--- |
|
|
# ComeTH (คำไทย): English-Thai Translation Quality Metrics |
|
|
|
|
|
ComeTH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for Evaluation of Translation) model specifically optimized for English-Thai translation quality assessment. This model evaluates machine translation outputs by providing quality scores that correlate highly with human judgments. |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- **Model Type**: Translation Quality Estimation |
|
|
- **Languages**: English-Thai |
|
|
- **Base Model**: COMET (Unbabel/wmt22-cometkiwi-da) |
|
|
- **Encoder**: XLM-RoBERTa-based (microsoft/infoxlm-large) |
|
|
- **Architecture**: Unified Metric with sentence-level scoring |
|
|
- **Framework**: COMET (Unbabel) |
|
|
- **Task**: Machine Translation Evaluation |
|
|
- **Parameters**: 565M (558M encoder + 6.3M estimator) |
|
|
|
|
|
## Versions |
|
|
|
|
|
We offer two variants of ComeTH with different training approaches: |
|
|
|
|
|
- **ComeTH**: Fine-tuned on human MQM annotations (Spearman's ρ = 0.4639) |
|
|
- **ComeTH-Augmented**: Fine-tuned on human + Claude-assisted annotations (Spearman's ρ = 0.4795) |
|
|
|
|
|
Both models outperform the base COMET model (Spearman's ρ = 0.4570) on English-Thai translation evaluation. The Claude-augmented version leverages LLM-generated annotations to enhance correlation with human judgments by 4.9% over the baseline. |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
- **Training Framework**: PyTorch Lightning |
|
|
- **Loss Function**: MSE |
|
|
- **Input Segments**: [mt, src] |
|
|
- **Final Layer Architecture**: [3072, 1024] |
|
|
- **Layer Transformation**: Sparsemax |
|
|
- **Activation Function**: Tanh |
|
|
- **Dropout**: 0.1 |
|
|
- **Learning Rate**: 1.5e-05 (Encoder: 1e-06) |
|
|
- **Layerwise Decay**: 0.95 |
|
|
- **Word Layer**: 24 |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The models were trained on: |
|
|
- **Size**: 23,530 English-Thai translation pairs |
|
|
- **Source Domains**: Diverse, including technical, conversational, and e-commerce |
|
|
- **Annotation Framework**: Multidimensional Quality Metrics (MQM) |
|
|
- **Error Categories**: |
|
|
- Minor: Issues that don't significantly impact meaning or usability |
|
|
- Major: Errors that significantly impact meaning but don't render content unusable |
|
|
- Critical: Errors that make content unusable or could have serious consequences |
|
|
- **Claude Augmentation**: Claude 3.5 Sonnet was used to generate supplementary quality judgments, enhancing the model's alignment with human evaluations |
|
|
|
|
|
## Training Process |
|
|
|
|
|
ComeTH was trained using a multi-step process: |
|
|
1. Starting from the wmt22-cometkiwi-da checkpoint |
|
|
2. Fine-tuning on human MQM annotations for 5 epochs |
|
|
3. Using gradient accumulation (8 steps) to simulate larger batch sizes |
|
|
4. Utilizing unified metric architecture that combines source and MT embeddings |
|
|
5. For the augmented variant: additional training with Claude-assisted annotations, weighted to balance human and machine judgments |
|
|
|
|
|
## Performance |
|
|
|
|
|
### Correlation with Human Judgments (Spearman's ρ) |
|
|
|
|
|
| Model | Spearman's ρ | RMSE | |
|
|
|-------|-------------|------| |
|
|
| COMET (baseline) | 0.4570 | 0.3185 | |
|
|
| ComeTH (human annotations) | 0.4639 | 0.3093 | |
|
|
| ComeTH-Augmented (human + Claude) | **0.4795** | **0.3078** | |
|
|
|
|
|
The Claude-augmented version demonstrates the highest correlation with human judgments, offering a significant improvement over both the baseline and human-only models. |
|
|
|
|
|
### Comparison with Other LLM Evaluators |
|
|
|
|
|
| Model | Spearman's ρ | |
|
|
|-------|-------------| |
|
|
| ComeTH-Augmented | **0.4795** | |
|
|
| Claude 3.5 Sonnet | 0.4383 | |
|
|
| GPT-4o Mini | 0.4352 | |
|
|
| Gemini 2.0 Flash | 0.3918 | |
|
|
|
|
|
ComeTH-Augmented outperforms direct evaluations from state-of-the-art LLMs, while being more computationally efficient for large-scale translation quality assessments. |
|
|
|
|
|
## Advanced Usage Examples |
|
|
|
|
|
### Basic Evaluation |
|
|
|
|
|
```python |
|
|
from comet import download_model, load_from_checkpoint |
|
|
model_path = download_model("wasanx/ComeTH") |
|
|
model = load_from_checkpoint(model_path) |
|
|
|
|
|
translations = [ |
|
|
{ |
|
|
"src": "This is an English source text.", |
|
|
"mt": "นี่คือข้อความภาษาอังกฤษ", |
|
|
} |
|
|
] |
|
|
results = model.predict(translations, batch_size=8, gpus=1) |
|
|
scores = results['scores'] |
|
|
``` |
|
|
|
|
|
### Batch Processing With Progress Tracking |
|
|
|
|
|
```python |
|
|
import pandas as pd |
|
|
from tqdm import tqdm |
|
|
|
|
|
df = pd.read_csv("translations.csv") |
|
|
input_data = df[['src', 'mt']].to_dict('records') |
|
|
|
|
|
batch_size = 32 |
|
|
all_scores = [] |
|
|
|
|
|
for i in tqdm(range(0, len(input_data), batch_size)): |
|
|
batch = input_data[i:i+batch_size] |
|
|
results = model.predict(batch, batch_size=len(batch), gpus=1) |
|
|
all_scores.extend(results['scores']) |
|
|
|
|
|
df['quality_score'] = all_scores |
|
|
``` |
|
|
|
|
|
### System-Level Evaluation |
|
|
|
|
|
```python |
|
|
import numpy as np |
|
|
|
|
|
systems = df.groupby('system_name')['quality_score'].agg(['mean', 'std', 'count']).reset_index() |
|
|
systems = systems.sort_values('mean', ascending=False) |
|
|
print(systems) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
``` |
|
|
@misc{ |
|
|
title = {COMETH: English-Thai Translation Quality Metrics}, |
|
|
author = {COMETH Team}, |
|
|
year = {2025}, |
|
|
howpublished = {Hugging Face Model Repository}, |
|
|
url = {https://huggingface.co/wasanx/ComeTH} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions or feedback: comethteam@gmail.com |
|
|
|
|
|
## License |
|
|
|
|
|
``` |
|
|
The COMETH Reserved License |
|
|
|
|
|
Cometh English-to-Thai Translation Data and Model License |
|
|
|
|
|
Copyright (C) Cometh Team. All rights reserved. |
|
|
|
|
|
This license governs the use of the Cometh English-to-Thai translation data and model ("Cometh Model Data"), including but not limited to MQM scores, human translations, and human rankings from various translation sources. |
|
|
|
|
|
Permitted Use |
|
|
The Cometh Model Data is licensed exclusively for internal use by the designated Cometh team. |
|
|
|
|
|
Prohibited Use |
|
|
The following uses are strictly prohibited: |
|
|
1. Any usage outside the designated purposes unanimously approved by the Cometh team. |
|
|
2. Redistribution, sharing, or distribution of the Cometh Model Data in any form. |
|
|
3. Citation or public reference to the Cometh Model Data in any academic, commercial, or non-commercial context. |
|
|
4. Any use beyond the internal operations of the Cometh team. |
|
|
|
|
|
Legal Enforcement |
|
|
Unauthorized use, distribution, or citation of the Cometh Model Data constitutes a violation of this license and may result in legal action, including but not limited to prosecution under applicable laws. |
|
|
|
|
|
Reservation of Rights |
|
|
All rights to the Cometh Model Data are reserved by the Cometh team. This license does not transfer any ownership rights. |
|
|
|
|
|
By accessing or using the Cometh Model Data, you agree to be bound by the terms of this license. |
|
|
``` |