qlemesle commited on
Commit
3567093
·
1 Parent(s): 53e33f3
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -5,11 +5,11 @@ tags:
5
  - evaluate
6
  - metric
7
  description: >-
8
- ParaPLUIE is a metric for evaluating the semantic proximity of two sentences.
9
- ParaPLUIE use the perplexity of an LLM to compute a confidence score. It has
10
- shown the highest correlation with human judgement on paraphrase
11
- classification meanwhile reamin the computional cost low as it roughtly equal
12
- to one token generation cost.
13
  sdk: gradio
14
  sdk_version: 3.19.1
15
  app_file: app.py
@@ -20,13 +20,13 @@ short_description: ParaPLUIE is a metric for evaluating the semantic proximity
20
  # Metric Card for ParaPLUIE (Paraphrase Generation Evaluation Powered by an LLM)
21
 
22
  ## Metric Description
23
- ParaPLUIE is a metric for evaluating the semantic proximity of two sentences.
24
- ParaPLUIE use the perplexity of an LLM to compute a confidence score.
25
- It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost.
26
 
27
  ## How to Use
28
 
29
- This metric requires a source sentence and it's hypothetical paraphrase.
30
 
31
  ```python
32
  import evaluate
@@ -46,9 +46,9 @@ print(results)
46
 
47
  ### Output Values
48
 
49
- - **score** (`float`): ParaPLUIE score. Minimum possible value is -inf. Maximum possible value is +inf. A score greater than 0 mean that sentences are paraphrases. A score lower than 0 mean the opposite.
50
 
51
- This metric outputs a dictionary, containing the score.
52
 
53
  ### Examples
54
 
@@ -76,7 +76,7 @@ ppluie.init(
76
  )
77
  ```
78
 
79
- show available prompting templates
80
  ```python
81
  ppluie.show_templates()
82
  >>> DIRECT
@@ -90,7 +90,7 @@ ppluie.show_templates()
90
  >>> NETWORK
91
  ```
92
 
93
- show LLM already tested with ParaPLUIE
94
  ```python
95
  ppluie.show_available_models()
96
  >>> HuggingFaceTB/SmolLM2-135M-Instruct
@@ -112,18 +112,18 @@ ppluie.show_available_models()
112
  >>> CohereForAI/c4ai-command-r-08-2024
113
  ```
114
 
115
- change prompting template
116
  ```python
117
  ppluie.setTemplate("DIRECT")
118
  ```
119
 
120
- show how is the prompt encoded, to ensure that the correct numbers of special tokens are removed and Yes / No words fit on one token
121
  ```python
122
  ppluie.check_end_tokens_tmpl()
123
  ```
124
 
125
  ## Limitations and Bias
126
- This metric is based on an LLM and therefore is limited by the LLM used.
127
 
128
  ## Source code
129
  [GitLab](https://gitlab.inria.fr/expression/paraphrase-generation-evaluation-powered-by-an-llm-a-semantic-metric-not-a-lexical-one-coling-2025)
 
5
  - evaluate
6
  - metric
7
  description: >-
8
+ ParaPLUIE is a metric for evaluating the semantic proximity between two sentences.
9
+ ParaPLUIE uses the perplexity of an LLM to compute a confidence score. It has
10
+ shown the highest correlation with human judgment on paraphrase
11
+ classification while maintaining a low computational cost, as it roughly equivalent
12
+ to the cost of generating a single token.
13
  sdk: gradio
14
  sdk_version: 3.19.1
15
  app_file: app.py
 
20
  # Metric Card for ParaPLUIE (Paraphrase Generation Evaluation Powered by an LLM)
21
 
22
  ## Metric Description
23
+ ParaPLUIE is a metric for evaluating the semantic proximity between two sentences.
24
+ ParaPLUIE uses the perplexity of an LLM to compute a confidence score.
25
+ It has shown the highest correlation with human judgment on paraphrase classification while maintaining a low computational cost, as it roughly equivalent to the cost of generating a single token.
26
 
27
  ## How to Use
28
 
29
+ This metric requires a source sentence and its hypothetical paraphrase.
30
 
31
  ```python
32
  import evaluate
 
46
 
47
  ### Output Values
48
 
49
+ - **score** (`float`): ParaPLUIE score. Minimum possible value is -inf. Maximum possible value is +inf. A score greater than 0 means that sentences are paraphrases. A score lower than 0 indicates the opposite.
50
 
51
+ This metric outputs a dictionary containing the score.
52
 
53
  ### Examples
54
 
 
76
  )
77
  ```
78
 
79
+ Show the available prompting templates
80
  ```python
81
  ppluie.show_templates()
82
  >>> DIRECT
 
90
  >>> NETWORK
91
  ```
92
 
93
+ Show the LLMs that have already been tested with ParaPLUIE
94
  ```python
95
  ppluie.show_available_models()
96
  >>> HuggingFaceTB/SmolLM2-135M-Instruct
 
112
  >>> CohereForAI/c4ai-command-r-08-2024
113
  ```
114
 
115
+ Change the prompting template
116
  ```python
117
  ppluie.setTemplate("DIRECT")
118
  ```
119
 
120
+ Show how the prompt is encoded to ensure that the correct numbers of special tokens are removed and that the words "Yes" and "No" each fit into a single token
121
  ```python
122
  ppluie.check_end_tokens_tmpl()
123
  ```
124
 
125
  ## Limitations and Bias
126
+ This metric is based on an LLM and is therefore limited by the LLM that is used.
127
 
128
  ## Source code
129
  [GitLab](https://gitlab.inria.fr/expression/paraphrase-generation-evaluation-powered-by-an-llm-a-semantic-metric-not-a-lexical-one-coling-2025)