Spaces:

qlemesle
/

parapluie

Sleeping

File size: 4,359 Bytes

---
title: ParaPLUIE
emoji: ☂️
tags:
- evaluate
- metric
description: >-
  ParaPLUIE is a metric for evaluating the semantic proximity between two sentences. 
  ParaPLUIE uses the perplexity of an LLM to compute a confidence score. It has
  shown the highest correlation with human judgment on paraphrase
  classification while maintaining a low computational cost, as it roughly equivalent
  to the cost of generating a single token.
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
short_description: ParaPLUIE is a metric for evaluating the semantic proximity
---

# Metric Card for ParaPLUIE (Paraphrase Generation Evaluation Powered by an LLM)

## Metric Description
ParaPLUIE is a metric for evaluating the semantic proximity between two sentences. 
ParaPLUIE uses the perplexity of an LLM to compute a confidence score.
It has shown the highest correlation with human judgment on paraphrase classification while maintaining a low computational cost, as it roughly equivalent to the cost of generating a single token.

## How to Use

This metric requires a source sentence and its hypothetical paraphrase.

```python
import evaluate
ppluie = evaluate.load("qlemesle/parapluie")
ppluie.init(model="mistralai/Mistral-7B-Instruct-v0.2")
S = "Have you ever seen a tsunami ?" 
H = "Have you ever seen a tiramisu ?"
results = ppluie.compute(sources=[S], hypotheses=[H])
print(results)
>>> {'scores': [-16.97607421875]}
```

### Inputs

- **sources** (`list` of `string`): Source sentences.
- **hypotheses** (`list` of `string`): Hypothetical paraphrases.

### Output Values

- **score** (`float`): ParaPLUIE score. Minimum possible value is -inf. Maximum possible value is +inf. A score greater than 0 means that sentences are paraphrases. A score lower than 0 indicates the opposite.

This metric outputs a dictionary containing the score.

### Examples

Simple example
```python
import evaluate
ppluie = evaluate.load("qlemesle/parapluie")
ppluie.init(model="mistralai/Mistral-7B-Instruct-v0.2")
S = "Have you ever seen a tsunami ?" 
H = "Have you ever seen a tiramisu ?"
results = ppluie.compute(sources=[S], hypotheses=[H])
print(results)
>>> {'scores': [-16.97607421875]}
```

Configure metric
```python
ppluie.init(
  model = "mistralai/Mistral-7B-Instruct-v0.2",
  device = "cuda:0",
  template = "FS-DIRECT",
  use_chat_template = True,
  half_mode = True,
  n_right_specials_tokens = 1
)
```

Show the available prompting templates
```python
ppluie.show_templates()
>>> DIRECT
>>> MEANING
>>> INDIRECT
>>> FS-DIRECT
>>> FS-DIRECT_MAJ
>>> FS-DIRECT_FR
>>> FS-DIRECT_MAJ_FR
>>> FS-DIRECT_FR_MIN
>>> NETWORK
```

Show the LLMs that have already been tested with ParaPLUIE
```python
ppluie.show_available_models()
>>> HuggingFaceTB/SmolLM2-135M-Instruct
>>> HuggingFaceTB/SmolLM2-360M-Instruct
>>> HuggingFaceTB/SmolLM2-1.7B-Instruct
>>> google/gemma-2-2b-it
>>> state-spaces/mamba-2.8b-hf
>>> internlm/internlm2-chat-1_8b
>>> microsoft/Phi-4-mini-instruct
>>> mistralai/Mistral-7B-Instruct-v0.2
>>> tiiuae/falcon-mamba-7b-instruct
>>> Qwen/Qwen2.5-7B-Instruct
>>> CohereForAI/aya-expanse-8b
>>> google/gemma-2-9b-it
>>> meta-llama/Meta-Llama-3-8B-Instruct
>>> microsoft/phi-4
>>> CohereForAI/aya-expanse-32b
>>> Qwen/QwQ-32B
>>> CohereForAI/c4ai-command-r-08-2024
```

Change the prompting template
```python
ppluie.setTemplate("DIRECT")
```

Show how the prompt is encoded to ensure that the correct numbers of special tokens are removed and that the words "Yes" and "No" each fit into a single token
```python
ppluie.check_end_tokens_tmpl()
```

## Limitations and Bias
This metric is based on an LLM and is therefore limited by the LLM that is used.

## Source code
[GitLab](https://gitlab.inria.fr/expression/paraphrase-generation-evaluation-powered-by-an-llm-a-semantic-metric-not-a-lexical-one-coling-2025)


## Citation
```bibtex
@inproceedings{lemesle-etal-2025-paraphrase,
    title = "Paraphrase Generation Evaluation Powered by an {LLM}: A Semantic Metric, Not a Lexical One",
    author = "Lemesle, Quentin  and
      Chevelu, Jonathan  and
      Martin, Philippe  and
      Lolive, Damien  and
      Delhay, Arnaud  and
      Barbot, Nelly",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
    year = "2025",
    url = "https://aclanthology.org/2025.coling-main.538/"
}
```