Spaces:
Runtime error
Runtime error
Update Space (evaluate main: 56af7abb)
Browse files- README.md +11 -5
- bleurt.py +2 -2
- requirements.txt +1 -1
README.md
CHANGED
|
@@ -42,9 +42,15 @@ This metric takes as input lists of predicted sentences and reference sentences:
|
|
| 42 |
```
|
| 43 |
|
| 44 |
### Inputs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
- **predictions** (`list` of `str`s): List of generated sentences to score.
|
| 46 |
- **references** (`list` of `str`s): List of references to compare to.
|
| 47 |
-
- **checkpoint** (`str`): BLEURT checkpoint. Will default to `BLEURT-tiny` if not specified. Other models that can be chosen are: `"bleurt-tiny-128"`, `"bleurt-tiny-512"`, `"bleurt-base-128"`, `"bleurt-base-512"`, `"bleurt-large-128"`, `"bleurt-large-512"`, `"BLEURT-20-D3"`, `"BLEURT-20-D6"`, `"BLEURT-20-D12"` and `"BLEURT-20"`.
|
| 48 |
|
| 49 |
### Output Values
|
| 50 |
- **scores** : a `list` of scores, one per prediction.
|
|
@@ -65,7 +71,7 @@ BLEURT is used to compare models across different asks (e.g. (Table to text gene
|
|
| 65 |
|
| 66 |
### Examples
|
| 67 |
|
| 68 |
-
Example with the default model:
|
| 69 |
```python
|
| 70 |
>>> predictions = ["hello there", "general kenobi"]
|
| 71 |
>>> references = ["hello there", "general kenobi"]
|
|
@@ -75,14 +81,14 @@ Example with the default model:
|
|
| 75 |
{'scores': [1.0295498371124268, 1.0445425510406494]}
|
| 76 |
```
|
| 77 |
|
| 78 |
-
Example with the `"
|
| 79 |
```python
|
| 80 |
>>> predictions = ["hello there", "general kenobi"]
|
| 81 |
>>> references = ["hello there", "general kenobi"]
|
| 82 |
-
>>> bleurt = load("bleurt", module_type="metric",
|
| 83 |
>>> results = bleurt.compute(predictions=predictions, references=references)
|
| 84 |
>>> print(results)
|
| 85 |
-
{'scores': [1.
|
| 86 |
```
|
| 87 |
|
| 88 |
## Limitations and Bias
|
|
|
|
| 42 |
```
|
| 43 |
|
| 44 |
### Inputs
|
| 45 |
+
|
| 46 |
+
For the `load` function:
|
| 47 |
+
|
| 48 |
+
- **config_name** (`str`): BLEURT checkpoint. Will default to `"bleurt-base-128"` if not specified. Other models that can be chosen are: `"bleurt-tiny-128"`, `"bleurt-tiny-512"`, `"bleurt-base-128"`, `"bleurt-base-512"`, `"bleurt-large-128"`, `"bleurt-large-512"`, `"BLEURT-20-D3"`, `"BLEURT-20-D6"`, `"BLEURT-20-D12"` and `"BLEURT-20"`.
|
| 49 |
+
|
| 50 |
+
For the `compute` function:
|
| 51 |
+
|
| 52 |
- **predictions** (`list` of `str`s): List of generated sentences to score.
|
| 53 |
- **references** (`list` of `str`s): List of references to compare to.
|
|
|
|
| 54 |
|
| 55 |
### Output Values
|
| 56 |
- **scores** : a `list` of scores, one per prediction.
|
|
|
|
| 71 |
|
| 72 |
### Examples
|
| 73 |
|
| 74 |
+
Example with the default model (`"bleurt-base-128"`):
|
| 75 |
```python
|
| 76 |
>>> predictions = ["hello there", "general kenobi"]
|
| 77 |
>>> references = ["hello there", "general kenobi"]
|
|
|
|
| 81 |
{'scores': [1.0295498371124268, 1.0445425510406494]}
|
| 82 |
```
|
| 83 |
|
| 84 |
+
Example with the full `"BLEURT-20"` model checkpoint:
|
| 85 |
```python
|
| 86 |
>>> predictions = ["hello there", "general kenobi"]
|
| 87 |
>>> references = ["hello there", "general kenobi"]
|
| 88 |
+
>>> bleurt = load("bleurt", module_type="metric", config_name="BLEURT-20")
|
| 89 |
>>> results = bleurt.compute(predictions=predictions, references=references)
|
| 90 |
>>> print(results)
|
| 91 |
+
{'scores': [1.015415906906128, 0.9985226988792419]}
|
| 92 |
```
|
| 93 |
|
| 94 |
## Limitations and Bias
|
bleurt.py
CHANGED
|
@@ -100,8 +100,8 @@ class BLEURT(evaluate.Metric):
|
|
| 100 |
# check that config name specifies a valid BLEURT model
|
| 101 |
if self.config_name == "default":
|
| 102 |
logger.warning(
|
| 103 |
-
"Using default
|
| 104 |
-
"You can use a bigger model for better results with e.g.: evaluate.load('bleurt', 'bleurt-large-512')."
|
| 105 |
)
|
| 106 |
self.config_name = "bleurt-base-128"
|
| 107 |
|
|
|
|
| 100 |
# check that config name specifies a valid BLEURT model
|
| 101 |
if self.config_name == "default":
|
| 102 |
logger.warning(
|
| 103 |
+
"Using default checkpoint 'bleurt-base-128' for sequence maximum length 128. "
|
| 104 |
+
"You can use a bigger model for better results with e.g.: evaluate.load('bleurt', config_name='bleurt-large-512')."
|
| 105 |
)
|
| 106 |
self.config_name = "bleurt-base-128"
|
| 107 |
|
requirements.txt
CHANGED
|
@@ -1,2 +1,2 @@
|
|
| 1 |
-
git+https://github.com/huggingface/evaluate@
|
| 2 |
git+https://github.com/google-research/bleurt.git
|
|
|
|
| 1 |
+
git+https://github.com/huggingface/evaluate@56af7abbb160fa2a5a3c0d268f2bfd3baff8015c
|
| 2 |
git+https://github.com/google-research/bleurt.git
|