|
|
--- |
|
|
pipeline_tag: translation |
|
|
--- |
|
|
|
|
|
# cometoid22-wmt21 |
|
|
|
|
|
A referenceless/quality-estimation metric for machine translation evaluation. |
|
|
This metric is created by using the knowledge distillation of [wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) (a referece-based teacher). |
|
|
Refer to [the publication](https://aclanthology.org/2023.wmt-1.62) for technical details. |
|
|
|
|
|
|
|
|
## Setup |
|
|
|
|
|
Option 1: Install `pymarian`, aka Python bindings to Marian |
|
|
|
|
|
```bash |
|
|
pip install pymarian |
|
|
``` |
|
|
|
|
|
Option 2: Build marian binary, reference: https://marian-nmt.github.io/quickstart/ |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
**Pymarian** |
|
|
```bash |
|
|
pymarian-eval -m checkpoints/marian.model.bin -v vocab.spm --like comet-qe -s src.txt -t mt.out.txt |
|
|
``` |
|
|
|
|
|
**Marian** |
|
|
|
|
|
```bash |
|
|
paste src.txt mt.out.txt | marian evaluate --quiet --model checkpoints/marian.model.bin --vocabs vocab.spm vocab.spm --width 4 --like comet-qe \ |
|
|
--mini-batch 16 --maxi-batch 256 --max-length 512 --max-length-crop true --workspace 8000 |
|
|
``` |
|
|
|
|
|
|
|
|
More info at https://github.com/marian-nmt/wmt23-metrics |
|
|
|
|
|
|
|
|
|
|
|
## Reference |
|
|
``` |
|
|
@inproceedings{gowda-etal-2023-cometoid, |
|
|
title = "Cometoid: Distilling Strong Reference-based Machine Translation Metrics into {E}ven Stronger Quality Estimation Metrics", |
|
|
author = "Gowda, Thamme and |
|
|
Kocmi, Tom and |
|
|
Junczys-Dowmunt, Marcin", |
|
|
editor = "Koehn, Philipp and |
|
|
Haddon, Barry and |
|
|
Kocmi, Tom and |
|
|
Monz, Christof", |
|
|
booktitle = "Proceedings of the Eighth Conference on Machine Translation", |
|
|
month = dec, |
|
|
year = "2023", |
|
|
address = "Singapore", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
url = "https://aclanthology.org/2023.wmt-1.62", |
|
|
pages = "751--755", |
|
|
} |
|
|
``` |