wasanx
/

ComeTH

@@ -21,7 +21,7 @@ base_model: Unbabel/wmt22-cometkiwi-da
 pipeline_tag: translation
 library_name: unbabel-comet
 model-index:
-  - name: ComETH-Augmented
     results:
       - task:
           type: translation-quality-estimation
@@ -52,7 +52,7 @@ model-index:
 ---
 # ComeTH: Thai-English Translation Quality Metrics
-ComETH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for Evaluation of Translation) model specifically optimized for Thai-English translation quality assessment. This model evaluates machine translation outputs by providing quality scores that correlate highly with human judgments.
 ## Model Overview
@@ -67,10 +67,10 @@ ComETH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for E
 ## Versions
-We offer two variants of ComETH with different training approaches:
-- **ComETH**: Fine-tuned on human MQM annotations (Spearman's ρ = 0.4639)
-- **ComETH-Augmented**: Fine-tuned on human + Claude-assisted annotations (Spearman's ρ = 0.4795)
 Both models outperform the base COMET model (Spearman's ρ = 0.4570) on Thai-English translation evaluation. The Claude-augmented version leverages LLM-generated annotations to enhance correlation with human judgments by 4.9% over the baseline.
@@ -101,7 +101,7 @@ The models were trained on:
 ## Training Process
-ComETH was trained using a multi-step process:
 1. Starting from the wmt22-cometkiwi-da checkpoint
 2. Fine-tuning on human MQM annotations for 5 epochs
 3. Using gradient accumulation (8 steps) to simulate larger batch sizes
@@ -115,8 +115,8 @@ ComETH was trained using a multi-step process:
 | Model | Spearman's ρ | RMSE |
 |-------|-------------|------|
 | COMET (baseline) | 0.4570 | 0.3185 |
-| ComETH (human annotations) | 0.4639 | 0.3093 |
-| ComETH-Augmented (human + Claude) | **0.4795** | **0.3078** |
 The Claude-augmented version demonstrates the highest correlation with human judgments, offering a significant improvement over both the baseline and human-only models.
@@ -124,12 +124,12 @@ The Claude-augmented version demonstrates the highest correlation with human jud
 | Model | Spearman's ρ |
 |-------|-------------|
-| ComETH-Augmented | **0.4795** |
 | Claude 3.5 Sonnet | 0.4383 |
 | GPT-4o Mini | 0.4352 |
 | Gemini 2.0 Flash | 0.3918 |
-ComETH-Augmented outperforms direct evaluations from state-of-the-art LLMs, while being more computationally efficient for large-scale translation quality assessments.
 ## Advanced Usage Examples
@@ -139,7 +139,7 @@ ComETH-Augmented outperforms direct evaluations from state-of-the-art LLMs, whil
 from comet import download_model, load_from_checkpoint
 # Load the model
-model = load_from_checkpoint("cometh-team/ComETH-Augmented")
 # Prepare input data
 translations = [

 pipeline_tag: translation
 library_name: unbabel-comet
 model-index:
+  - name: ComeTH
     results:
       - task:
           type: translation-quality-estimation
 ---
 # ComeTH: Thai-English Translation Quality Metrics
+ComeTH is a fine-tuned version of the COMET (Crosslingual Optimized Metric for Evaluation of Translation) model specifically optimized for Thai-English translation quality assessment. This model evaluates machine translation outputs by providing quality scores that correlate highly with human judgments.
 ## Model Overview
 ## Versions
+We offer two variants of ComeTH with different training approaches:
+- **ComeTH**: Fine-tuned on human MQM annotations (Spearman's ρ = 0.4639)
+- **ComeTH-Augmented**: Fine-tuned on human + Claude-assisted annotations (Spearman's ρ = 0.4795)
 Both models outperform the base COMET model (Spearman's ρ = 0.4570) on Thai-English translation evaluation. The Claude-augmented version leverages LLM-generated annotations to enhance correlation with human judgments by 4.9% over the baseline.
 ## Training Process
+ComeTH was trained using a multi-step process:
 1. Starting from the wmt22-cometkiwi-da checkpoint
 2. Fine-tuning on human MQM annotations for 5 epochs
 3. Using gradient accumulation (8 steps) to simulate larger batch sizes
 | Model | Spearman's ρ | RMSE |
 |-------|-------------|------|
 | COMET (baseline) | 0.4570 | 0.3185 |
+| ComeTH (human annotations) | 0.4639 | 0.3093 |
+| ComeTH-Augmented (human + Claude) | **0.4795** | **0.3078** |
 The Claude-augmented version demonstrates the highest correlation with human judgments, offering a significant improvement over both the baseline and human-only models.
 | Model | Spearman's ρ |
 |-------|-------------|
+| ComeTH-Augmented | **0.4795** |
 | Claude 3.5 Sonnet | 0.4383 |
 | GPT-4o Mini | 0.4352 |
 | Gemini 2.0 Flash | 0.3918 |
+ComeTH-Augmented outperforms direct evaluations from state-of-the-art LLMs, while being more computationally efficient for large-scale translation quality assessments.
 ## Advanced Usage Examples
 from comet import download_model, load_from_checkpoint
 # Load the model
+model = load_from_checkpoint("wasanx/ComeTH")
 # Prepare input data
 translations = [