Prosho commited on
Commit
0d89670
·
verified ·
1 Parent(s): 1987c00

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: translation
3
+ language: multilingual
4
+ library_name: transformers
5
+ base_model:
6
+ - FacebookAI/xlm-roberta-large
7
+ ---
8
+
9
+ <div align="center">
10
+
11
+ <h1 style="font-family: 'Arial', sans-serif; font-size: 28px; font-weight: bold; color: black;">
12
+ 📊 Estimating Machine Translation Difficulty
13
+ </h1>
14
+
15
+ </div>
16
+
17
+ <div style="display:flex; justify-content: center; align-items: center; flex-direction: row;">
18
+ <a href="https://arxiv.org/abs/2508.10175"><img src="https://img.shields.io/badge/arXiv-2508.10175-b31b1b.svg"></a> &nbsp; &nbsp;
19
+ <a href="https://huggingface.co/collections/Prosho/translation-difficulty-estimators-6816665c008e1d22426eb6c4"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-FCD21D"></a> &nbsp; &nbsp;
20
+ <a href="https://github.com/prosho-97/guardians-mt-eval"><img src="https://img.shields.io/badge/GitHub-Repo-121013?logo=github&logoColor=white"></a> &nbsp; &nbsp;
21
+ </div>
22
+
23
+ This repository contains the **SENTINEL<sub>SRC</sub>** metric model used for Difficulty Sampling at the WMT25 General Machine Translation Shared Task, and analyzed in our paper **Estimating Machine Translation Difficulty**.
24
+
25
+ ## Usage
26
+
27
+ To run this model, install our git repository with the following command:
28
+
29
+ ```bash
30
+ pip install git+https://github.com/prosho-97/guardians-mt-eval
31
+ ```
32
+
33
+ After having installed our repository package, you can use this model within Python in the following way:
34
+
35
+ ```python
36
+ from sentinel_metric import download_model, load_from_checkpoint
37
+
38
+ model_path = download_model("Prosho/sentinel-src-25")
39
+ model = load_from_checkpoint(model_path)
40
+
41
+ data = [
42
+ {"src": "Please sign the form."},
43
+ {"src": "He spilled the beans, then backpedaled—talk about mixed signals!"}
44
+ ]
45
+
46
+ output = model.predict(data, batch_size=8, gpus=1)
47
+ ```
48
+
49
+ Output:
50
+ ```python
51
+ # Segment scores
52
+ >>> output.scores
53
+ [0.5604351758956909, -0.08413456380367279]
54
+
55
+ # System score
56
+ >>> output.system_score
57
+ 0.23815030604600906
58
+ ```
59
+
60
+ Where the higher the output score, the easier it is to translate the input source text.
61
+
62
+ ## Cite this work
63
+ This work has been accepted at [EMNLP 2025](https://2025.emnlp.org/). If you use any part, please consider citing our paper as follows:
64
+
65
+ ```bibtex
66
+ @misc{proietti2025estimatingmachinetranslationdifficulty,
67
+ title={Estimating Machine Translation Difficulty},
68
+ author={Lorenzo Proietti and Stefano Perrella and Vilém Zouhar and Roberto Navigli and Tom Kocmi},
69
+ year={2025},
70
+ eprint={2508.10175},
71
+ archivePrefix={arXiv},
72
+ primaryClass={cs.CL},
73
+ url={https://arxiv.org/abs/2508.10175},
74
+ }
75
+ ```