Darsala
/

georgian_comet

 ---
 language:
 - ka
+- en
+license: apache-2.0
+tags:
+- translation
+- evaluation
+- comet
+- mt-evaluation
+- georgian
+metrics:
+- kendall_tau
+- spearman_correlation
+- pearson_correlation
+model-index:
+- name: Georgian-COMET
+  results:
+  - task:
+      type: translation-evaluation
+      name: Machine Translation Evaluation
+    dataset:
+      name: Georgian MT Evaluation Dataset
+      type: Darsala/georgian_metric_evaluation
+    metrics:
+    - type: pearson_correlation
+      value: 0.878
+      name: Pearson Correlation
+    - type: spearman_correlation
+      value: 0.796
+      name: Spearman Correlation
+    - type: kendall_tau
+      value: 0.603
+      name: Kendall's Tau
+base_model: Unbabel/wmt22-comet-da
+datasets:
+- Darsala/georgian_metric_evaluation
+---
+# Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation
+This is a [COMET](https://github.com/Unbabel/COMET) evaluation model fine-tuned specifically for English-Georgian machine translation evaluation. It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.
+## Model Description
+Georgian-COMET is a fine-tuned version of [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) that has been optimized for evaluating English-to-Georgian translations through knowledge distillation from Claude Sonnet 4. The model shows significant improvements over the base model when evaluating Georgian translations.
+### Key Improvements over Base Model
+| Metric | Base COMET | Georgian-COMET | Improvement |
+|--------|------------|----------------|-------------|
+| Pearson | 0.867 | **0.878** | +1.1% |
+| Spearman | 0.759 | **0.796** | +3.7% |
+| Kendall | 0.564 | **0.603** | +3.9% |
+## Paper
+- **Base Model Paper**: [COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task](https://aclanthology.org/2022.wmt-1.52) (Rei et al., WMT 2022)
+- **This Model**: Paper coming soon
+## Repository
+[https://github.com/LukaDarsalia/nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research)
+## License
+Apache-2.0
+## Usage (unbabel-comet)
+Using this model requires unbabel-comet to be installed:
+```bash
+pip install --upgrade pip  # ensures that pip is current
+pip install unbabel-comet
+```
+### Option 1: Direct Download from HuggingFace
+```python
+from comet import load_from_checkpoint
+import requests
+import os
+# Download the model checkpoint
+model_url = "https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt"
+model_path = "georgian_comet.ckpt"
+# Download if not already present
+if not os.path.exists(model_path):
+    response = requests.get(model_url)
+    with open(model_path, 'wb') as f:
+        f.write(response.content)
+# Load the model
+model = load_from_checkpoint(model_path)
+# Prepare your data
+data = [
+    {
+        "src": "The cat sat on the mat.",
+        "mt": "კატა ზის ხალიჩაზე.",
+        "ref": "კატა იჯდა ხალიჩაზე."
+    },
+    {
+        "src": "Schools and kindergartens were opened.",
+        "mt": "სკოლები და საბავშვო ბაღები გაიხსნა.",
+        "ref": "გაიხსნა სკოლები და საბავშვო ბაღები."
+    }
+]
+# Get predictions
+model_output = model.predict(data, batch_size=8, gpus=1)
+print(model_output)
+```
+### Option 2: Using comet CLI
+First download the model checkpoint:
+```bash
+wget https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt -O georgian_comet.ckpt
+```
+Then use it with comet CLI:
+```bash
+comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model georgian_comet.ckpt
+```
+### Option 3: Integration with Evaluation Pipeline
+```python
+from comet import load_from_checkpoint
+import pandas as pd
+# Load model
+model = load_from_checkpoint("georgian_comet.ckpt")
+# Load your evaluation data
+df = pd.read_csv("your_evaluation_data.csv")
+# Prepare data in COMET format
+data = [
+    {
+        "src": row["sourceText"],
+        "mt": row["targetText"],
+        "ref": row["referenceText"]
+    }
+    for _, row in df.iterrows()
+]
+# Get scores
+scores = model.predict(data, batch_size=16)
+print(f"Average score: {sum(scores['scores']) / len(scores['scores']):.3f}")
+```
+## Intended Uses
+This model is intended to be used for **English-Georgian MT evaluation**.
+Given a triplet with (source sentence in English, translation in Georgian, reference translation in Georgian), it outputs a single score between 0 and 1 where 1 represents a perfect translation.
+### Primary Use Cases
+1. **MT System Development**: Evaluate and compare different English-Georgian MT systems
+2. **Quality Assurance**: Automated quality checks for Georgian translations
+3. **Research**: Study MT evaluation for morphologically rich languages like Georgian
+4. **Production Monitoring**: Track translation quality in production environments
+### Out-of-Scope Use
+- **Other Language Pairs**: This model is specifically fine-tuned for English-Georgian and may not perform well on other language pairs
+- **Reference-Free Evaluation**: The model requires reference translations
+- **Document-Level**: Optimized for sentence-level evaluation
+## Training Details
+### Training Data
+- **Dataset**: 5,000 English-Georgian pairs from [corp.dict.ge](https://corp.dict.ge/)
+- **MT Systems**: Translations from SMaLL-100, Google Translate, and Ucraft Translate
+- **Scoring Method**: Knowledge distillation from Claude Sonnet 4 with added Gaussian noise (σ=3)
+- **Details**: See [Darsala/georgian_metric_evaluation](https://huggingface.co/datasets/Darsala/georgian_metric_evaluation)
+### Training Configuration
+```yaml
+regression_metric:
+  init_args:
+    nr_frozen_epochs: 0.3
+    keep_embeddings_frozen: True
+    optimizer: AdamW
+    encoder_learning_rate: 1.5e-05
+    learning_rate: 1.5e-05
+    loss: mse
+    dropout: 0.1
+    batch_size: 8
+```
+### Training Procedure
+1. **Base Model**: Started from Unbabel/wmt22-comet-da checkpoint
+2. **Knowledge Distillation**: Used Claude Sonnet 4 scores as training targets
+3. **Robustness**: Added Gaussian noise to training scores to prevent overfitting
+4. **Optimization**: 8 epochs with early stopping (patience=4) on validation Kendall's tau
+## Evaluation Results
+### Test Set Performance
+Evaluated on 400 human-annotated English-Georgian translation pairs:
+| Metric | Score | p-value |
+|--------|-------|---------|
+| Pearson | 0.878 | < 0.001 |
+| Spearman | 0.796 | < 0.001 |
+| Kendall | 0.603 | < 0.001 |
+### Comparison with Other Metrics
+| Metric | Pearson | Spearman | Kendall |
+|--------|---------|----------|---------|
+| **Georgian-COMET** | **0.878** | 0.796 | 0.603 |
+| Base COMET | 0.867 | 0.759 | 0.564 |
+| LLM-Reference-Based | 0.852 | **0.798** | **0.660** |
+| CHRF++ | 0.739 | 0.690 | 0.498 |
+| TER | 0.466 | 0.443 | 0.311 |
+| BLEU | 0.413 | 0.497 | 0.344 |
+## Languages Covered
+While the base model (XLM-R) covers 100+ languages, this fine-tuned version is specifically optimized for:
+- **Source Language**: English (en)
+- **Target Language**: Georgian (ka)
+For other language pairs, we recommend using the base [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) model.
+## Limitations
+1. **Language Specific**: Optimized only for English→Georgian evaluation
+2. **Domain**: Training data primarily from corp.dict.ge (general/literary domain)
+3. **Reference Required**: Cannot perform reference-free evaluation
+4. **Sentence Level**: Not optimized for document-level evaluation
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{georgian-comet-2025,
+  title={Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation},
+  author={Luka Darsalia, Ketevan Bakhturidze, Saba Sturua},
+  year={2025},
+  publisher={HuggingFace},
+  url={https://huggingface.co/Darsala/georgian_comet}
+}
+@inproceedings{rei-etal-2022-comet,
+  title = "{COMET}-22: Unbabel-{IST} 2022 Submission for the Metrics Shared Task",
+  author = "Rei, Ricardo  and
+    C. de Souza, Jos{\'e} G.  and
+    Alves, Duarte  and
+    Zerva, Chrysoula  and
+    Farinha, Ana C  and
+    Glushkova, Taisiya  and
+    Lavie, Alon  and
+    Coheur, Luisa  and
+    Martins, Andr{\'e} F. T.",
+  booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
+  year = "2022",
+  address = "Abu Dhabi, United Arab Emirates",
+  publisher = "Association for Computational Linguistics",
+  url = "https://aclanthology.org/2022.wmt-1.52",
+  pages = "578--585",
+}
+```
+## Acknowledgments
+- [Unbabel](https://unbabel.com/) team for the base COMET model
+- [Anthropic](https://anthropic.com/) for Claude Sonnet 4 used in knowledge distillation
+- [corp.dict.ge](https://corp.dict.ge/) for the Georgian-English corpus
+- All contributors to the [nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research) project