ShaomuTan commited on
Commit
8a54837
Β·
verified Β·
1 Parent(s): e49e1b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -12
README.md CHANGED
@@ -1,10 +1,10 @@
1
- ---
2
- license: apache-2.0
3
- base_model:
4
- - google/gemma-2-9b-it
5
- tags:
6
- - translation
7
- ---
8
 
9
 
10
  # πŸš€ ReMedy: Machine Translation Evaluation via Reward Modeling
@@ -227,6 +227,7 @@ Inspired by **SacreBLEU**, ReMedy provides JSON-style results to ensure transpar
227
 
228
  | Model | Size | Base Model | Ref/QE | Download |
229
  |---------------|------|--------------|--------|----------|
 
230
  | ReMedy-9B-22 | 9B | Gemma-2-9B | Both | [πŸ€— HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-22) |
231
  | ReMedy-9B-23 | 9B | Gemma-2-9B | Both | [πŸ€— HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-23) |
232
  | ReMedy-9B-24 | 9B | Gemma-2-9B | Both | [πŸ€— HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-24) |
@@ -273,12 +274,26 @@ bash wmt/wmt24.sh
273
  If you use **ReMedy**, please cite the following paper:
274
 
275
  ```bibtex
276
- @article{tan2024remedy,
277
- title={ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling},
278
- author={Tan, Shaomu and Monz, Christof},
279
- journal={arXiv preprint},
280
- year={2024}
 
 
 
 
 
 
 
 
 
 
 
 
 
281
  }
 
282
  ```
283
 
284
  ---
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - google/gemma-2-9b-it
5
+ tags:
6
+ - translation
7
+ ---
8
 
9
 
10
  # πŸš€ ReMedy: Machine Translation Evaluation via Reward Modeling
 
227
 
228
  | Model | Size | Base Model | Ref/QE | Download |
229
  |---------------|------|--------------|--------|----------|
230
+ | ReMedy-2B | 2B | Gemma-2-2B | Both | [πŸ€— HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-2B) |
231
  | ReMedy-9B-22 | 9B | Gemma-2-9B | Both | [πŸ€— HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-22) |
232
  | ReMedy-9B-23 | 9B | Gemma-2-9B | Both | [πŸ€— HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-23) |
233
  | ReMedy-9B-24 | 9B | Gemma-2-9B | Both | [πŸ€— HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-24) |
 
274
  If you use **ReMedy**, please cite the following paper:
275
 
276
  ```bibtex
277
+ @inproceedings{tan-monz-2025-remedy,
278
+ title = "{R}e{M}edy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling",
279
+ author = "Tan, Shaomu and
280
+ Monz, Christof",
281
+ editor = "Christodoulopoulos, Christos and
282
+ Chakraborty, Tanmoy and
283
+ Rose, Carolyn and
284
+ Peng, Violet",
285
+ booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
286
+ month = nov,
287
+ year = "2025",
288
+ address = "Suzhou, China",
289
+ publisher = "Association for Computational Linguistics",
290
+ url = "https://aclanthology.org/2025.emnlp-main.217/",
291
+ doi = "10.18653/v1/2025.emnlp-main.217",
292
+ pages = "4370--4387",
293
+ ISBN = "979-8-89176-332-6",
294
+ abstract = "A key challenge in MT evaluation is the inherent noise and inconsistency of human ratings. Regression-based neural metrics struggle with this noise, while prompting LLMs shows promise at system-level evaluation but performs poorly at segment level. In this work, we propose ReMedy, a novel MT metric framework that reformulates translation evaluation as a reward modeling task. Instead of regressing on imperfect human ratings directly, ReMedy learns relative translation quality using pairwise preference data, resulting in a more reliable evaluation. In extensive experiments across WMT22-24 shared tasks (39 language pairs, 111 MT systems), ReMedy achieves state-of-the-art performance at both segment- and system-level evaluation. Specifically, ReMedy-9B surpasses larger WMT winners and massive closed LLMs such as MetricX-13B, XCOMET-Ensemble, GEMBA-GPT-4, PaLM-540B, and finetuned PaLM2. Further analyses demonstrate that ReMedy delivers superior capability in detecting translation errors and evaluating low-quality translations."
295
  }
296
+
297
  ```
298
 
299
  ---