orth
Browse files
README.md
CHANGED
|
@@ -5,11 +5,11 @@ tags:
|
|
| 5 |
- evaluate
|
| 6 |
- metric
|
| 7 |
description: >-
|
| 8 |
-
ParaPLUIE is a metric for evaluating the semantic proximity
|
| 9 |
-
ParaPLUIE
|
| 10 |
-
shown the highest correlation with human
|
| 11 |
-
classification
|
| 12 |
-
to
|
| 13 |
sdk: gradio
|
| 14 |
sdk_version: 3.19.1
|
| 15 |
app_file: app.py
|
|
@@ -20,13 +20,13 @@ short_description: ParaPLUIE is a metric for evaluating the semantic proximity
|
|
| 20 |
# Metric Card for ParaPLUIE (Paraphrase Generation Evaluation Powered by an LLM)
|
| 21 |
|
| 22 |
## Metric Description
|
| 23 |
-
ParaPLUIE is a metric for evaluating the semantic proximity
|
| 24 |
-
ParaPLUIE
|
| 25 |
-
It has shown the highest correlation with human
|
| 26 |
|
| 27 |
## How to Use
|
| 28 |
|
| 29 |
-
This metric requires a source sentence and
|
| 30 |
|
| 31 |
```python
|
| 32 |
import evaluate
|
|
@@ -46,9 +46,9 @@ print(results)
|
|
| 46 |
|
| 47 |
### Output Values
|
| 48 |
|
| 49 |
-
- **score** (`float`): ParaPLUIE score. Minimum possible value is -inf. Maximum possible value is +inf. A score greater than 0
|
| 50 |
|
| 51 |
-
This metric outputs a dictionary
|
| 52 |
|
| 53 |
### Examples
|
| 54 |
|
|
@@ -76,7 +76,7 @@ ppluie.init(
|
|
| 76 |
)
|
| 77 |
```
|
| 78 |
|
| 79 |
-
|
| 80 |
```python
|
| 81 |
ppluie.show_templates()
|
| 82 |
>>> DIRECT
|
|
@@ -90,7 +90,7 @@ ppluie.show_templates()
|
|
| 90 |
>>> NETWORK
|
| 91 |
```
|
| 92 |
|
| 93 |
-
|
| 94 |
```python
|
| 95 |
ppluie.show_available_models()
|
| 96 |
>>> HuggingFaceTB/SmolLM2-135M-Instruct
|
|
@@ -112,18 +112,18 @@ ppluie.show_available_models()
|
|
| 112 |
>>> CohereForAI/c4ai-command-r-08-2024
|
| 113 |
```
|
| 114 |
|
| 115 |
-
|
| 116 |
```python
|
| 117 |
ppluie.setTemplate("DIRECT")
|
| 118 |
```
|
| 119 |
|
| 120 |
-
|
| 121 |
```python
|
| 122 |
ppluie.check_end_tokens_tmpl()
|
| 123 |
```
|
| 124 |
|
| 125 |
## Limitations and Bias
|
| 126 |
-
This metric is based on an LLM and therefore
|
| 127 |
|
| 128 |
## Source code
|
| 129 |
[GitLab](https://gitlab.inria.fr/expression/paraphrase-generation-evaluation-powered-by-an-llm-a-semantic-metric-not-a-lexical-one-coling-2025)
|
|
|
|
| 5 |
- evaluate
|
| 6 |
- metric
|
| 7 |
description: >-
|
| 8 |
+
ParaPLUIE is a metric for evaluating the semantic proximity between two sentences.
|
| 9 |
+
ParaPLUIE uses the perplexity of an LLM to compute a confidence score. It has
|
| 10 |
+
shown the highest correlation with human judgment on paraphrase
|
| 11 |
+
classification while maintaining a low computational cost, as it roughly equivalent
|
| 12 |
+
to the cost of generating a single token.
|
| 13 |
sdk: gradio
|
| 14 |
sdk_version: 3.19.1
|
| 15 |
app_file: app.py
|
|
|
|
| 20 |
# Metric Card for ParaPLUIE (Paraphrase Generation Evaluation Powered by an LLM)
|
| 21 |
|
| 22 |
## Metric Description
|
| 23 |
+
ParaPLUIE is a metric for evaluating the semantic proximity between two sentences.
|
| 24 |
+
ParaPLUIE uses the perplexity of an LLM to compute a confidence score.
|
| 25 |
+
It has shown the highest correlation with human judgment on paraphrase classification while maintaining a low computational cost, as it roughly equivalent to the cost of generating a single token.
|
| 26 |
|
| 27 |
## How to Use
|
| 28 |
|
| 29 |
+
This metric requires a source sentence and its hypothetical paraphrase.
|
| 30 |
|
| 31 |
```python
|
| 32 |
import evaluate
|
|
|
|
| 46 |
|
| 47 |
### Output Values
|
| 48 |
|
| 49 |
+
- **score** (`float`): ParaPLUIE score. Minimum possible value is -inf. Maximum possible value is +inf. A score greater than 0 means that sentences are paraphrases. A score lower than 0 indicates the opposite.
|
| 50 |
|
| 51 |
+
This metric outputs a dictionary containing the score.
|
| 52 |
|
| 53 |
### Examples
|
| 54 |
|
|
|
|
| 76 |
)
|
| 77 |
```
|
| 78 |
|
| 79 |
+
Show the available prompting templates
|
| 80 |
```python
|
| 81 |
ppluie.show_templates()
|
| 82 |
>>> DIRECT
|
|
|
|
| 90 |
>>> NETWORK
|
| 91 |
```
|
| 92 |
|
| 93 |
+
Show the LLMs that have already been tested with ParaPLUIE
|
| 94 |
```python
|
| 95 |
ppluie.show_available_models()
|
| 96 |
>>> HuggingFaceTB/SmolLM2-135M-Instruct
|
|
|
|
| 112 |
>>> CohereForAI/c4ai-command-r-08-2024
|
| 113 |
```
|
| 114 |
|
| 115 |
+
Change the prompting template
|
| 116 |
```python
|
| 117 |
ppluie.setTemplate("DIRECT")
|
| 118 |
```
|
| 119 |
|
| 120 |
+
Show how the prompt is encoded to ensure that the correct numbers of special tokens are removed and that the words "Yes" and "No" each fit into a single token
|
| 121 |
```python
|
| 122 |
ppluie.check_end_tokens_tmpl()
|
| 123 |
```
|
| 124 |
|
| 125 |
## Limitations and Bias
|
| 126 |
+
This metric is based on an LLM and is therefore limited by the LLM that is used.
|
| 127 |
|
| 128 |
## Source code
|
| 129 |
[GitLab](https://gitlab.inria.fr/expression/paraphrase-generation-evaluation-powered-by-an-llm-a-semantic-metric-not-a-lexical-one-coling-2025)
|