cometoid22-wmt21 / README.md
thammegowda's picture
Update README.md
c49a354 verified
---
pipeline_tag: translation
---
# cometoid22-wmt21
A referenceless/quality-estimation metric for machine translation evaluation.
This metric is created by using the knowledge distillation of [wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) (a referece-based teacher).
Refer to [the publication](https://aclanthology.org/2023.wmt-1.62) for technical details.
## Setup
Option 1: Install `pymarian`, aka Python bindings to Marian
```bash
pip install pymarian
```
Option 2: Build marian binary, reference: https://marian-nmt.github.io/quickstart/
## Usage
**Pymarian**
```bash
pymarian-eval -m checkpoints/marian.model.bin -v vocab.spm --like comet-qe -s src.txt -t mt.out.txt
```
**Marian**
```bash
paste src.txt mt.out.txt | marian evaluate --quiet --model checkpoints/marian.model.bin --vocabs vocab.spm vocab.spm --width 4 --like comet-qe \
--mini-batch 16 --maxi-batch 256 --max-length 512 --max-length-crop true --workspace 8000
```
More info at https://github.com/marian-nmt/wmt23-metrics
## Reference
```
@inproceedings{gowda-etal-2023-cometoid,
title = "Cometoid: Distilling Strong Reference-based Machine Translation Metrics into {E}ven Stronger Quality Estimation Metrics",
author = "Gowda, Thamme and
Kocmi, Tom and
Junczys-Dowmunt, Marcin",
editor = "Koehn, Philipp and
Haddon, Barry and
Kocmi, Tom and
Monz, Christof",
booktitle = "Proceedings of the Eighth Conference on Machine Translation",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.wmt-1.62",
pages = "751--755",
}
```