parapluie / README.md
qlemesle's picture
orth
3567093
---
title: ParaPLUIE
emoji: ☂️
tags:
- evaluate
- metric
description: >-
ParaPLUIE is a metric for evaluating the semantic proximity between two sentences.
ParaPLUIE uses the perplexity of an LLM to compute a confidence score. It has
shown the highest correlation with human judgment on paraphrase
classification while maintaining a low computational cost, as it roughly equivalent
to the cost of generating a single token.
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
short_description: ParaPLUIE is a metric for evaluating the semantic proximity
---
# Metric Card for ParaPLUIE (Paraphrase Generation Evaluation Powered by an LLM)
## Metric Description
ParaPLUIE is a metric for evaluating the semantic proximity between two sentences.
ParaPLUIE uses the perplexity of an LLM to compute a confidence score.
It has shown the highest correlation with human judgment on paraphrase classification while maintaining a low computational cost, as it roughly equivalent to the cost of generating a single token.
## How to Use
This metric requires a source sentence and its hypothetical paraphrase.
```python
import evaluate
ppluie = evaluate.load("qlemesle/parapluie")
ppluie.init(model="mistralai/Mistral-7B-Instruct-v0.2")
S = "Have you ever seen a tsunami ?"
H = "Have you ever seen a tiramisu ?"
results = ppluie.compute(sources=[S], hypotheses=[H])
print(results)
>>> {'scores': [-16.97607421875]}
```
### Inputs
- **sources** (`list` of `string`): Source sentences.
- **hypotheses** (`list` of `string`): Hypothetical paraphrases.
### Output Values
- **score** (`float`): ParaPLUIE score. Minimum possible value is -inf. Maximum possible value is +inf. A score greater than 0 means that sentences are paraphrases. A score lower than 0 indicates the opposite.
This metric outputs a dictionary containing the score.
### Examples
Simple example
```python
import evaluate
ppluie = evaluate.load("qlemesle/parapluie")
ppluie.init(model="mistralai/Mistral-7B-Instruct-v0.2")
S = "Have you ever seen a tsunami ?"
H = "Have you ever seen a tiramisu ?"
results = ppluie.compute(sources=[S], hypotheses=[H])
print(results)
>>> {'scores': [-16.97607421875]}
```
Configure metric
```python
ppluie.init(
model = "mistralai/Mistral-7B-Instruct-v0.2",
device = "cuda:0",
template = "FS-DIRECT",
use_chat_template = True,
half_mode = True,
n_right_specials_tokens = 1
)
```
Show the available prompting templates
```python
ppluie.show_templates()
>>> DIRECT
>>> MEANING
>>> INDIRECT
>>> FS-DIRECT
>>> FS-DIRECT_MAJ
>>> FS-DIRECT_FR
>>> FS-DIRECT_MAJ_FR
>>> FS-DIRECT_FR_MIN
>>> NETWORK
```
Show the LLMs that have already been tested with ParaPLUIE
```python
ppluie.show_available_models()
>>> HuggingFaceTB/SmolLM2-135M-Instruct
>>> HuggingFaceTB/SmolLM2-360M-Instruct
>>> HuggingFaceTB/SmolLM2-1.7B-Instruct
>>> google/gemma-2-2b-it
>>> state-spaces/mamba-2.8b-hf
>>> internlm/internlm2-chat-1_8b
>>> microsoft/Phi-4-mini-instruct
>>> mistralai/Mistral-7B-Instruct-v0.2
>>> tiiuae/falcon-mamba-7b-instruct
>>> Qwen/Qwen2.5-7B-Instruct
>>> CohereForAI/aya-expanse-8b
>>> google/gemma-2-9b-it
>>> meta-llama/Meta-Llama-3-8B-Instruct
>>> microsoft/phi-4
>>> CohereForAI/aya-expanse-32b
>>> Qwen/QwQ-32B
>>> CohereForAI/c4ai-command-r-08-2024
```
Change the prompting template
```python
ppluie.setTemplate("DIRECT")
```
Show how the prompt is encoded to ensure that the correct numbers of special tokens are removed and that the words "Yes" and "No" each fit into a single token
```python
ppluie.check_end_tokens_tmpl()
```
## Limitations and Bias
This metric is based on an LLM and is therefore limited by the LLM that is used.
## Source code
[GitLab](https://gitlab.inria.fr/expression/paraphrase-generation-evaluation-powered-by-an-llm-a-semantic-metric-not-a-lexical-one-coling-2025)
## Citation
```bibtex
@inproceedings{lemesle-etal-2025-paraphrase,
title = "Paraphrase Generation Evaluation Powered by an {LLM}: A Semantic Metric, Not a Lexical One",
author = "Lemesle, Quentin and
Chevelu, Jonathan and
Martin, Philippe and
Lolive, Damien and
Delhay, Arnaud and
Barbot, Nelly",
booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
year = "2025",
url = "https://aclanthology.org/2025.coling-main.538/"
}
```