ShaomuTan
/

ReMedy-9B-22

Translation

Safetensors

gemma2

Model card Files Files and versions

xet

Community

ShaomuTan commited on Nov 22, 2025

Commit

8a54837

verified ·

1 Parent(s): e49e1b3

Update README.md

Browse files

Files changed (1) hide show

README.md +27 -12

README.md CHANGED Viewed

@@ -1,10 +1,10 @@
----
-license: apache-2.0
-base_model:
-- google/gemma-2-9b-it
-tags:
-- translation
----
 # 🚀 ReMedy: Machine Translation Evaluation via Reward Modeling
@@ -227,6 +227,7 @@ Inspired by **SacreBLEU**, ReMedy provides JSON-style results to ensure transpar
 | Model         | Size | Base Model   | Ref/QE | Download |
 |---------------|------|--------------|--------|----------|
 | ReMedy-9B-22  | 9B   | Gemma-2-9B   | Both   | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-22) |
 | ReMedy-9B-23  | 9B   | Gemma-2-9B   | Both   | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-23) |
 | ReMedy-9B-24  | 9B   | Gemma-2-9B   | Both   | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-24) |
@@ -273,12 +274,26 @@ bash wmt/wmt24.sh
 If you use **ReMedy**, please cite the following paper:
 ```bibtex
-@article{tan2024remedy,
-  title={ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling},
-  author={Tan, Shaomu and Monz, Christof},
-  journal={arXiv preprint},
-  year={2024}
 }
 ```
 ---

+---
+license: apache-2.0
+base_model:
+- google/gemma-2-9b-it
+tags:
+- translation
+---
 # 🚀 ReMedy: Machine Translation Evaluation via Reward Modeling
 | Model         | Size | Base Model   | Ref/QE | Download |
 |---------------|------|--------------|--------|----------|
+| ReMedy-2B     | 2B   | Gemma-2-2B   | Both   | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-2B) |
 | ReMedy-9B-22  | 9B   | Gemma-2-9B   | Both   | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-22) |
 | ReMedy-9B-23  | 9B   | Gemma-2-9B   | Both   | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-23) |
 | ReMedy-9B-24  | 9B   | Gemma-2-9B   | Both   | [🤗 HuggingFace](https://huggingface.co/ShaomuTan/ReMedy-9B-24) |
 If you use **ReMedy**, please cite the following paper:
 ```bibtex
+@inproceedings{tan-monz-2025-remedy,
+    title = "{R}e{M}edy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling",
+    author = "Tan, Shaomu  and
+      Monz, Christof",
+    editor = "Christodoulopoulos, Christos  and
+      Chakraborty, Tanmoy  and
+      Rose, Carolyn  and
+      Peng, Violet",
+    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
+    month = nov,
+    year = "2025",
+    address = "Suzhou, China",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2025.emnlp-main.217/",
+    doi = "10.18653/v1/2025.emnlp-main.217",
+    pages = "4370--4387",
+    ISBN = "979-8-89176-332-6",
+    abstract = "A key challenge in MT evaluation is the inherent noise and inconsistency of human ratings. Regression-based neural metrics struggle with this noise, while prompting LLMs shows promise at system-level evaluation but performs poorly at segment level. In this work, we propose ReMedy, a novel MT metric framework that reformulates translation evaluation as a reward modeling task. Instead of regressing on imperfect human ratings directly, ReMedy learns relative translation quality using pairwise preference data, resulting in a more reliable evaluation. In extensive experiments across WMT22-24 shared tasks (39 language pairs, 111 MT systems), ReMedy achieves state-of-the-art performance at both segment- and system-level evaluation. Specifically, ReMedy-9B surpasses larger WMT winners and massive closed LLMs such as MetricX-13B, XCOMET-Ensemble, GEMBA-GPT-4, PaLM-540B, and finetuned PaLM2. Further analyses demonstrate that ReMedy delivers superior capability in detecting translation errors and evaluating low-quality translations."
 }
 ```
 ---