Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
base_model:
|
| 4 |
-
- google/gemma-2-9b-it
|
| 5 |
-
tags:
|
| 6 |
-
- translation
|
| 7 |
-
---
|
| 8 |
|
| 9 |
|
| 10 |
# π ReMedy: Machine Translation Evaluation via Reward Modeling
|
|
@@ -227,6 +227,7 @@ Inspired by **SacreBLEU**, ReMedy provides JSON-style results to ensure transpar
|
|
| 227 |
|
| 228 |
| Model | Size | Base Model | Ref/QE | Download |
|
| 229 |
|---------------|------|--------------|--------|----------|
|
|
|
|
| 230 |
| ReMedy-9B-22 | 9B | Gemma-2-9B | Both | [π€ HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-22) |
|
| 231 |
| ReMedy-9B-23 | 9B | Gemma-2-9B | Both | [π€ HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-23) |
|
| 232 |
| ReMedy-9B-24 | 9B | Gemma-2-9B | Both | [π€ HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-24) |
|
|
@@ -273,12 +274,26 @@ bash wmt/wmt24.sh
|
|
| 273 |
If you use **ReMedy**, please cite the following paper:
|
| 274 |
|
| 275 |
```bibtex
|
| 276 |
-
@
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 281 |
}
|
|
|
|
| 282 |
```
|
| 283 |
|
| 284 |
---
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- google/gemma-2-9b-it
|
| 5 |
+
tags:
|
| 6 |
+
- translation
|
| 7 |
+
---
|
| 8 |
|
| 9 |
|
| 10 |
# π ReMedy: Machine Translation Evaluation via Reward Modeling
|
|
|
|
| 227 |
|
| 228 |
| Model | Size | Base Model | Ref/QE | Download |
|
| 229 |
|---------------|------|--------------|--------|----------|
|
| 230 |
+
| ReMedy-2B | 2B | Gemma-2-2B | Both | [π€ HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-2B) |
|
| 231 |
| ReMedy-9B-22 | 9B | Gemma-2-9B | Both | [π€ HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-22) |
|
| 232 |
| ReMedy-9B-23 | 9B | Gemma-2-9B | Both | [π€ HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-23) |
|
| 233 |
| ReMedy-9B-24 | 9B | Gemma-2-9B | Both | [π€ HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-24) |
|
|
|
|
| 274 |
If you use **ReMedy**, please cite the following paper:
|
| 275 |
|
| 276 |
```bibtex
|
| 277 |
+
@inproceedings{tan-monz-2025-remedy,
|
| 278 |
+
title = "{R}e{M}edy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling",
|
| 279 |
+
author = "Tan, Shaomu and
|
| 280 |
+
Monz, Christof",
|
| 281 |
+
editor = "Christodoulopoulos, Christos and
|
| 282 |
+
Chakraborty, Tanmoy and
|
| 283 |
+
Rose, Carolyn and
|
| 284 |
+
Peng, Violet",
|
| 285 |
+
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
|
| 286 |
+
month = nov,
|
| 287 |
+
year = "2025",
|
| 288 |
+
address = "Suzhou, China",
|
| 289 |
+
publisher = "Association for Computational Linguistics",
|
| 290 |
+
url = "https://aclanthology.org/2025.emnlp-main.217/",
|
| 291 |
+
doi = "10.18653/v1/2025.emnlp-main.217",
|
| 292 |
+
pages = "4370--4387",
|
| 293 |
+
ISBN = "979-8-89176-332-6",
|
| 294 |
+
abstract = "A key challenge in MT evaluation is the inherent noise and inconsistency of human ratings. Regression-based neural metrics struggle with this noise, while prompting LLMs shows promise at system-level evaluation but performs poorly at segment level. In this work, we propose ReMedy, a novel MT metric framework that reformulates translation evaluation as a reward modeling task. Instead of regressing on imperfect human ratings directly, ReMedy learns relative translation quality using pairwise preference data, resulting in a more reliable evaluation. In extensive experiments across WMT22-24 shared tasks (39 language pairs, 111 MT systems), ReMedy achieves state-of-the-art performance at both segment- and system-level evaluation. Specifically, ReMedy-9B surpasses larger WMT winners and massive closed LLMs such as MetricX-13B, XCOMET-Ensemble, GEMBA-GPT-4, PaLM-540B, and finetuned PaLM2. Further analyses demonstrate that ReMedy delivers superior capability in detecting translation errors and evaluating low-quality translations."
|
| 295 |
}
|
| 296 |
+
|
| 297 |
```
|
| 298 |
|
| 299 |
---
|