Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -3,14 +3,14 @@ title: Perplexity
|
|
| 3 |
emoji: 🤗
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: red
|
| 6 |
-
sdk:
|
| 7 |
-
sdk_version: 3.19.1
|
| 8 |
-
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
tags:
|
| 11 |
- evaluate
|
| 12 |
- metric
|
| 13 |
description: >-
|
|
|
|
|
|
|
| 14 |
Perplexity (PPL) is one of the most common metrics for evaluating language
|
| 15 |
models. It is defined as the exponentiated average negative log-likelihood of
|
| 16 |
a sequence, calculated with exponent base `e`.
|
|
@@ -21,8 +21,10 @@ description: >-
|
|
| 21 |
|
| 22 |
# Metric Card for Perplexity
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
| 26 |
|
| 27 |
As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on.
|
| 28 |
|
|
@@ -39,7 +41,7 @@ The metric takes a list of text as input, as well as the name of the model used
|
|
| 39 |
|
| 40 |
```python
|
| 41 |
from evaluate import load
|
| 42 |
-
perplexity = load("perplexity"
|
| 43 |
results = perplexity.compute(predictions=predictions, model_id='gpt2')
|
| 44 |
```
|
| 45 |
|
|
@@ -50,6 +52,7 @@ results = perplexity.compute(predictions=predictions, model_id='gpt2')
|
|
| 50 |
- **batch_size** (int): the batch size to run texts through the model. Defaults to 16.
|
| 51 |
- **add_start_token** (bool): whether to add the start token to the texts, so the perplexity can include the probability of the first word. Defaults to True.
|
| 52 |
- **device** (str): device to run on, defaults to `cuda` when available
|
|
|
|
| 53 |
|
| 54 |
### Output Values
|
| 55 |
This metric outputs a dictionary with the perplexity scores for the text input in the list, and the average perplexity.
|
|
|
|
| 3 |
emoji: 🤗
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: red
|
| 6 |
+
sdk: static
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
tags:
|
| 9 |
- evaluate
|
| 10 |
- metric
|
| 11 |
description: >-
|
| 12 |
+
This is a fork of the huggingface evaluate library's implementation of perplexity.
|
| 13 |
+
|
| 14 |
Perplexity (PPL) is one of the most common metrics for evaluating language
|
| 15 |
models. It is defined as the exponentiated average negative log-likelihood of
|
| 16 |
a sequence, calculated with exponent base `e`.
|
|
|
|
| 21 |
|
| 22 |
# Metric Card for Perplexity
|
| 23 |
|
| 24 |
+
> This is a fork of the huggingface evaluate library's implementation of perplexity.
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
## Metric DescriptionGiven a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence.
|
| 28 |
|
| 29 |
As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on.
|
| 30 |
|
|
|
|
| 41 |
|
| 42 |
```python
|
| 43 |
from evaluate import load
|
| 44 |
+
perplexity = load("pico-lm/perplexity")
|
| 45 |
results = perplexity.compute(predictions=predictions, model_id='gpt2')
|
| 46 |
```
|
| 47 |
|
|
|
|
| 52 |
- **batch_size** (int): the batch size to run texts through the model. Defaults to 16.
|
| 53 |
- **add_start_token** (bool): whether to add the start token to the texts, so the perplexity can include the probability of the first word. Defaults to True.
|
| 54 |
- **device** (str): device to run on, defaults to `cuda` when available
|
| 55 |
+
- **trust_remote_code** (bool): enables running metric on custom models
|
| 56 |
|
| 57 |
### Output Values
|
| 58 |
This metric outputs a dictionary with the perplexity scores for the text input in the list, and the average perplexity.
|