georgian_comet / README.md

Update README.md

ecf433d verified 6 months ago

8.73 kB

	---
	language:
	- ka
	- en
	license: apache-2.0
	tags:
	- translation
	- evaluation
	- comet
	- mt-evaluation
	- georgian
	metrics:
	- kendall_tau
	- spearman_correlation
	- pearson_correlation
	model-index:
	- name: Georgian-COMET
	results:
	- task:
	type: translation-evaluation
	name: Machine Translation Evaluation
	dataset:
	name: Georgian MT Evaluation Dataset
	type: Darsala/georgian_metric_evaluation
	metrics:
	- type: pearson_correlation
	value: 0.876
	name: Pearson Correlation
	- type: spearman_correlation
	value: 0.773
	name: Spearman Correlation
	- type: kendall_tau
	value: 0.579
	name: Kendall's Tau
	base_model: Unbabel/wmt22-comet-da
	datasets:
	- Darsala/georgian_metric_evaluation
	---

	# Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation

	This is a [COMET](https://github.com/Unbabel/COMET) evaluation model fine-tuned specifically for English-Georgian machine translation evaluation. It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.

	## Model Description

	Georgian-COMET is a fine-tuned version of [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) that has been optimized for evaluating English-to-Georgian translations through knowledge distillation from Claude Sonnet 4. The model shows significant improvements over the base model when evaluating Georgian translations.

	### Key Improvements over Base Model

	\| Metric \| Base COMET \| Georgian-COMET \| Improvement \|
	\|--------\|------------\|----------------\|-------------\|
	\| Pearson \| 0.867 \| 0.876 \| +0.9% \|
	\| Spearman \| 0.759 \| 0.773 \| +1.4% \|
	\| Kendall \| 0.564 \| 0.579 \| +1.5% \|

	## Paper

	- Base Model Paper: [COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task](https://aclanthology.org/2022.wmt-1.52) (Rei et al., WMT 2022)
	- This Model: Paper coming soon

	## Repository

	[https://github.com/LukaDarsalia/nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research)

	## License

	Apache-2.0

	## Usage (unbabel-comet)

	Using this model requires unbabel-comet to be installed:

	```bash
	pip install --upgrade pip # ensures that pip is current
	pip install unbabel-comet
	```

	### Option 1: Direct Download from HuggingFace

	```python
	from comet import load_from_checkpoint
	import requests
	import os

	# Download the model checkpoint
	model_path = download_model("Darsala/georgian_comet")

	# Load the model
	model = load_from_checkpoint(model_path)

	# Prepare your data
	data = [
	{
	"src": "The cat sat on the mat.",
	"mt": "კატა ზის ხალიჩაზე.",
	"ref": "კატა იჯდა ხალიჩაზე."
	},
	{
	"src": "Schools and kindergartens were opened.",
	"mt": "სკოლები და საბავშვო ბაღები გაიხსნა.",
	"ref": "გაიხსნა სკოლები და საბავშვო ბაღები."
	}
	]

	# Get predictions
	model_output = model.predict(data, batch_size=8, gpus=1)
	print(model_output)
	```

	### Option 2: Using comet CLI

	First download the model checkpoint:
	```bash
	wget https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt -O georgian_comet.ckpt
	```

	Then use it with comet CLI:
	```bash
	comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model georgian_comet.ckpt
	```

	### Option 3: Integration with Evaluation Pipeline

	```python
	from comet import load_from_checkpoint
	import pandas as pd

	# Load model
	model = load_from_checkpoint("georgian_comet.ckpt")

	# Load your evaluation data
	df = pd.read_csv("your_evaluation_data.csv")

	# Prepare data in COMET format
	data = [
	{
	"src": row["sourceText"],
	"mt": row["targetText"],
	"ref": row["referenceText"]
	}
	for _, row in df.iterrows()
	]

	# Get scores
	scores = model.predict(data, batch_size=16)
	print(f"Average score: {sum(scores['scores']) / len(scores['scores']):.3f}")
	```

	## Intended Uses

	This model is intended to be used for English-Georgian MT evaluation.

	Given a triplet with (source sentence in English, translation in Georgian, reference translation in Georgian), it outputs a single score between 0 and 1 where 1 represents a perfect translation.

	### Primary Use Cases

	1. MT System Development: Evaluate and compare different English-Georgian MT systems
	2. Quality Assurance: Automated quality checks for Georgian translations
	3. Research: Study MT evaluation for morphologically rich languages like Georgian
	4. Production Monitoring: Track translation quality in production environments

	### Out-of-Scope Use

	- Other Language Pairs: This model is specifically fine-tuned for English-Georgian and may not perform well on other language pairs
	- Reference-Free Evaluation: The model requires reference translations
	- Document-Level: Optimized for sentence-level evaluation

	## Training Details

	### Training Data

	- Dataset: 5,000 English-Georgian pairs from [corp.dict.ge](https://corp.dict.ge/)
	- MT Systems: Translations from SMaLL-100, Google Translate, and Ucraft Translate
	- Scoring Method: Knowledge distillation from Claude Sonnet 4 with added Gaussian noise (σ=3)
	- Details: See [Darsala/georgian_metric_evaluation](https://huggingface.co/datasets/Darsala/georgian_metric_evaluation)

	### Training Configuration

	```yaml
	regression_metric:
	init_args:
	nr_frozen_epochs: 0.3
	keep_embeddings_frozen: True
	optimizer: AdamW
	encoder_learning_rate: 1.5e-05
	learning_rate: 1.5e-05
	loss: mse
	dropout: 0.1
	batch_size: 8
	```

	### Training Procedure

	1. Base Model: Started from Unbabel/wmt22-comet-da checkpoint
	2. Knowledge Distillation: Used Claude Sonnet 4 scores as training targets
	3. Robustness: Added Gaussian noise to training scores to prevent overfitting
	4. Optimization: 8 epochs with early stopping (patience=4) on validation Kendall's tau

	## Evaluation Results

	### Test Set Performance

	Evaluated on 400 human-annotated English-Georgian translation pairs:

	\| Metric \| Score \| p-value \|
	\|--------\|-------\|---------\|
	\| Pearson \| 0.876 \| < 0.001 \|
	\| Spearman \| 0.773 \| < 0.001 \|
	\| Kendall \| 0.579 \| < 0.001 \|

	### Comparison with Other Metrics

	\| Metric \| Pearson \| Spearman \| Kendall \|
	\|--------\|---------\|----------\|---------\|
	\| Georgian-COMET \| 0.876 \| 0.773 \| 0.579 \|
	\| Base COMET \| 0.867 \| 0.759 \| 0.564 \|
	\| LLM-Reference-Based \| 0.852 \| 0.798 \| 0.660 \|
	\| CHRF++ \| 0.739 \| 0.690 \| 0.498 \|
	\| TER \| 0.466 \| 0.443 \| 0.311 \|
	\| BLEU \| 0.413 \| 0.497 \| 0.344 \|

	## Languages Covered

	While the base model (XLM-R) covers 100+ languages, this fine-tuned version is specifically optimized for:
	- Source Language: English (en)
	- Target Language: Georgian (ka)

	For other language pairs, we recommend using the base [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) model.

	## Limitations

	1. Language Specific: Optimized only for English→Georgian evaluation
	2. Domain: Training data primarily from corp.dict.ge (general/literary domain)
	3. Reference Required: Cannot perform reference-free evaluation
	4. Sentence Level: Not optimized for document-level evaluation

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{georgian-comet-2025,
	title={Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation},
	author={Luka Darsalia, Ketevan Bakhturidze, Saba Sturua},
	year={2025},
	publisher={HuggingFace},
	url={https://huggingface.co/Darsala/georgian_comet}
	}

	@inproceedings{rei-etal-2022-comet,
	title = "{COMET}-22: Unbabel-{IST} 2022 Submission for the Metrics Shared Task",
	author = "Rei, Ricardo and
	C. de Souza, Jos{\'e} G. and
	Alves, Duarte and
	Zerva, Chrysoula and
	Farinha, Ana C and
	Glushkova, Taisiya and
	Lavie, Alon and
	Coheur, Luisa and
	Martins, Andr{\'e} F. T.",
	booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
	year = "2022",
	address = "Abu Dhabi, United Arab Emirates",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2022.wmt-1.52",
	pages = "578--585",
	}
	```

	## Acknowledgments

	- [Unbabel](https://unbabel.com/) team for the base COMET model
	- [Anthropic](https://anthropic.com/) for Claude Sonnet 4 used in knowledge distillation
	- [corp.dict.ge](https://corp.dict.ge/) for the Georgian-English corpus
	- All contributors to the [nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research) project