Spaces:
Paused
Paused
| title: Test ParaScore | |
| emoji: 🤗 | |
| colorFrom: blue | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 3.0.2 | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - evaluate | |
| - metric | |
| description: >- | |
| ParaScore is a new metric to scoring the performance of paraphrase generation tasks | |
| See the project at https://github.com/shadowkiller33/ParaScore for more information. | |
| # Metric Card for ParaScore | |
| ## Metric description | |
| ParaScore is a new metric to scoring the performance of paraphrase generation tasks | |
| ## How to use | |
| ```python | |
| from evaluate import load | |
| bertscore = load("transZ/test_parascore") | |
| predictions = ["hello there", "general kenobi"] | |
| references = ["hello there", "general kenobi"] | |
| results = bertscore.compute(predictions=predictions, references=references, lang="en") | |
| ``` | |
| ## Output values | |
| ParaScore outputs a dictionary with the following values: | |
| `score`: Range from 0.0 to 1.0 | |
| ## Limitations and bias | |
| The [original ParaScore paper](https://arxiv.org/abs/2202.08479) showed that ParaScore correlates well with human judgment on sentence-level and system-level evaluation, but this depends on the model and language pair selected. | |
| ## Citation | |
| ```bibtex | |
| @article{Shen2022, | |
| archivePrefix = {arXiv}, | |
| arxivId = {2202.08479}, | |
| author = {Shen, Lingfeng and Liu, Lemao and Jiang, Haiyun and Shi, Shuming}, | |
| journal = {EMNLP 2022 - 2022 Conference on Empirical Methods in Natural Language Processing, Proceedings}, | |
| eprint = {2202.08479}, | |
| month = {feb}, | |
| number = {1}, | |
| pages = {3178--3190}, | |
| title = {{On the Evaluation Metrics for Paraphrase Generation}}, | |
| url = {http://arxiv.org/abs/2202.08479}, | |
| year = {2022} | |
| } | |
| ``` | |
| ## Further References | |
| - [Offcial implementation](https://github.com/shadowkiller33/parascore_toolkit) |