File size: 1,698 Bytes
8f41cc4
 
 
 
c49a354
8f41cc4
 
 
 
 
 
 
 
c49a354
8f41cc4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
pipeline_tag: translation
---

# cometoid22-wmt21

A referenceless/quality-estimation metric for machine translation evaluation. 
This metric is created by using the knowledge distillation of [wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) (a referece-based teacher).
Refer to [the publication](https://aclanthology.org/2023.wmt-1.62) for technical details.


## Setup

Option 1: Install `pymarian`, aka Python bindings to Marian

```bash
pip install pymarian
```

Option 2: Build marian binary, reference: https://marian-nmt.github.io/quickstart/


## Usage

**Pymarian**
```bash
pymarian-eval -m checkpoints/marian.model.bin -v vocab.spm --like comet-qe  -s src.txt -t mt.out.txt
```

**Marian**

```bash
paste src.txt mt.out.txt | marian evaluate --quiet --model checkpoints/marian.model.bin --vocabs vocab.spm vocab.spm --width 4 --like comet-qe \
  --mini-batch 16 --maxi-batch 256 --max-length 512 --max-length-crop true --workspace 8000
```


More info at https://github.com/marian-nmt/wmt23-metrics



## Reference
```
@inproceedings{gowda-etal-2023-cometoid,
    title = "Cometoid: Distilling Strong Reference-based Machine Translation Metrics into {E}ven Stronger Quality Estimation Metrics",
    author = "Gowda, Thamme  and
      Kocmi, Tom  and
      Junczys-Dowmunt, Marcin",
    editor = "Koehn, Philipp  and
      Haddon, Barry  and
      Kocmi, Tom  and
      Monz, Christof",
    booktitle = "Proceedings of the Eighth Conference on Machine Translation",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.wmt-1.62",
    pages = "751--755",
}
```