perplexity

Running

App Files Files Community

rdiehlmartinez commited on Dec 11, 2024

Commit

e8e3158

verified ·

1 Parent(s): cf4f54e

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -6

README.md CHANGED Viewed

@@ -3,14 +3,14 @@ title: Perplexity
 emoji: 🤗
 colorFrom: blue
 colorTo: red
-sdk: gradio
-sdk_version: 3.19.1
-app_file: app.py
 pinned: false
 tags:
 - evaluate
 - metric
 description: >-
   Perplexity (PPL) is one of the most common metrics for evaluating language
   models. It is defined as the exponentiated average negative log-likelihood of
   a sequence, calculated with exponent base `e`.
@@ -21,8 +21,10 @@ description: >-
 # Metric Card for Perplexity
-## Metric Description
-Given a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence.
 As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on.
@@ -39,7 +41,7 @@ The metric takes a list of text as input, as well as the name of the model used
 ```python
 from evaluate import load
-perplexity = load("perplexity", module_type="metric")
 results = perplexity.compute(predictions=predictions, model_id='gpt2')
 ```
@@ -50,6 +52,7 @@ results = perplexity.compute(predictions=predictions, model_id='gpt2')
 - **batch_size** (int): the batch size to run texts through the model. Defaults to 16.
 - **add_start_token** (bool): whether to add the start token to the texts, so the perplexity can include the probability of the first word. Defaults to True.
 - **device** (str): device to run on, defaults to `cuda` when available
 ### Output Values
 This metric outputs a dictionary with the perplexity scores for the text input in the list, and the average perplexity.

 emoji: 🤗
 colorFrom: blue
 colorTo: red
+sdk: static
 pinned: false
 tags:
 - evaluate
 - metric
 description: >-
+  This is a fork of the huggingface evaluate library's implementation of perplexity.
   Perplexity (PPL) is one of the most common metrics for evaluating language
   models. It is defined as the exponentiated average negative log-likelihood of
   a sequence, calculated with exponent base `e`.
 # Metric Card for Perplexity
+> This is a fork of the huggingface evaluate library's implementation of perplexity.
+## Metric DescriptionGiven a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence.
 As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on.
 ```python
 from evaluate import load
+perplexity = load("pico-lm/perplexity")
 results = perplexity.compute(predictions=predictions, model_id='gpt2')
 ```
 - **batch_size** (int): the batch size to run texts through the model. Defaults to 16.
 - **add_start_token** (bool): whether to add the start token to the texts, so the perplexity can include the probability of the first word. Defaults to True.
 - **device** (str): device to run on, defaults to `cuda` when available
+- **trust_remote_code** (bool): enables running metric on custom models
 ### Output Values
 This metric outputs a dictionary with the perplexity scores for the text input in the list, and the average perplexity.