Prosho
/

sentinel-src-25

SENTINEL-SRC-MQM

Model card Files Files and versions

Prosho commited on Sep 15, 2025

Commit

0d89670

·

verified ·

1 Parent(s): 1987c00

Create README.md

Files changed (1) hide show

README.md +75 -0

README.md ADDED Viewed

	@@ -0,0 +1,75 @@

+---
+pipeline_tag: translation
+language: multilingual
+library_name: transformers
+base_model:
+- FacebookAI/xlm-roberta-large
+---
+<div align="center">
+<h1 style="font-family: 'Arial', sans-serif; font-size: 28px; font-weight: bold; color: black;">
+    📊 Estimating Machine Translation Difficulty
+</h1>
+</div>
+<div style="display:flex; justify-content: center; align-items: center; flex-direction: row;">
+    <a href="https://arxiv.org/abs/2508.10175"><img src="https://img.shields.io/badge/arXiv-2508.10175-b31b1b.svg"></a> &nbsp; &nbsp;
+    <a href="https://huggingface.co/collections/Prosho/translation-difficulty-estimators-6816665c008e1d22426eb6c4"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-FCD21D"></a> &nbsp; &nbsp;
+    <a href="https://github.com/prosho-97/guardians-mt-eval"><img src="https://img.shields.io/badge/GitHub-Repo-121013?logo=github&logoColor=white"></a> &nbsp; &nbsp;
+</div>
+This repository contains the **SENTINEL<sub>SRC</sub>** metric model used for Difficulty Sampling at the WMT25 General Machine Translation Shared Task, and analyzed in our paper **Estimating Machine Translation Difficulty**.
+## Usage
+To run this model, install our git repository with the following command:
+```bash
+pip install git+https://github.com/prosho-97/guardians-mt-eval
+```
+After having installed our repository package, you can use this model within Python in the following way:
+```python
+from sentinel_metric import download_model, load_from_checkpoint
+model_path = download_model("Prosho/sentinel-src-25")
+model = load_from_checkpoint(model_path)
+data = [
+    {"src": "Please sign the form."},
+    {"src": "He spilled the beans, then backpedaled—talk about mixed signals!"}
+]
+output = model.predict(data, batch_size=8, gpus=1)
+```
+Output:
+```python
+# Segment scores
+>>> output.scores
+[0.5604351758956909, -0.08413456380367279]
+# System score
+>>> output.system_score
+0.23815030604600906
+```
+Where the higher the output score, the easier it is to translate the input source text.
+## Cite this work
+This work has been accepted at [EMNLP 2025](https://2025.emnlp.org/). If you use any part, please consider citing our paper as follows:
+```bibtex
+@misc{proietti2025estimatingmachinetranslationdifficulty,
+      title={Estimating Machine Translation Difficulty},
+      author={Lorenzo Proietti and Stefano Perrella and Vilém Zouhar and Roberto Navigli and Tom Kocmi},
+      year={2025},
+      eprint={2508.10175},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2508.10175},
+}
+```