Spaces:
Running
Running
Add disclaimer for ROUGE being non-deterministic when using the aggregator and suggest a solution
#9
by
Eran
- opened
README.md
CHANGED
|
@@ -15,7 +15,7 @@ description: >-
|
|
| 15 |
evaluating automatic summarization and machine translation software in natural language processing.
|
| 16 |
The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.
|
| 17 |
|
| 18 |
-
Note that ROUGE is case insensitive, meaning that upper case letters are treated the same way as lower case letters.
|
| 19 |
|
| 20 |
This metrics is a wrapper around Google Research reimplementation of ROUGE:
|
| 21 |
https://github.com/google-research/google-research/tree/master/rouge
|
|
@@ -62,6 +62,11 @@ It can also deal with lists of references for each predictions:
|
|
| 62 |
{'rouge1': 0.8333, 'rouge2': 0.5, 'rougeL': 0.8333, 'rougeLsum': 0.8333}```
|
| 63 |
```
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
### Inputs
|
| 66 |
- **predictions** (`list`): list of predictions to score. Each prediction
|
| 67 |
should be a string with tokens separated by spaces.
|
|
|
|
| 15 |
evaluating automatic summarization and machine translation software in natural language processing.
|
| 16 |
The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.
|
| 17 |
|
| 18 |
+
Note that ROUGE is case insensitive, meaning that upper case letters are treated the same way as lower case letters. Also, note that by default, ROUGE uses a random resampling algorithm which is non-deterministic.
|
| 19 |
|
| 20 |
This metrics is a wrapper around Google Research reimplementation of ROUGE:
|
| 21 |
https://github.com/google-research/google-research/tree/master/rouge
|
|
|
|
| 62 |
{'rouge1': 0.8333, 'rouge2': 0.5, 'rougeL': 0.8333, 'rougeLsum': 0.8333}```
|
| 63 |
```
|
| 64 |
|
| 65 |
+
You can input the `load` with a seed to initialize the random number generator and fix ROUGE predictions from changing between different runs:
|
| 66 |
+
```python
|
| 67 |
+
>>> rouge = evaluate.load('rouge', seed=42)
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
### Inputs
|
| 71 |
- **predictions** (`list`): list of predictions to score. Each prediction
|
| 72 |
should be a string with tokens separated by spaces.
|