Spaces:
Build error
Build error
Update docs.
Browse files- README.md +125 -17
- vendiscore.py +6 -4
README.md
CHANGED
|
@@ -5,7 +5,7 @@ datasets:
|
|
| 5 |
tags:
|
| 6 |
- evaluate
|
| 7 |
- metric
|
| 8 |
-
description: "
|
| 9 |
sdk: gradio
|
| 10 |
sdk_version: 3.0.2
|
| 11 |
app_file: app.py
|
|
@@ -14,37 +14,145 @@ pinned: false
|
|
| 14 |
|
| 15 |
# Metric Card for VendiScore
|
| 16 |
|
| 17 |
-
|
|
|
|
|
|
|
| 18 |
|
| 19 |
## Metric Description
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## How to Use
|
| 23 |
-
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
### Inputs
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
### Output Values
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
## Limitations and Bias
|
| 44 |
-
|
| 45 |
|
| 46 |
## Citation
|
| 47 |
-
*Cite the source where this metric was introduced.*
|
| 48 |
|
| 49 |
-
## Further References
|
| 50 |
-
*Add any useful further references.*
|
|
|
|
| 5 |
tags:
|
| 6 |
- evaluate
|
| 7 |
- metric
|
| 8 |
+
description: "The Vendi Score is a metric for evaluating diversity in machine learning. See the project's README at https://github.com/vertaix/Vendi-Score for more information."
|
| 9 |
sdk: gradio
|
| 10 |
sdk_version: 3.0.2
|
| 11 |
app_file: app.py
|
|
|
|
| 14 |
|
| 15 |
# Metric Card for VendiScore
|
| 16 |
|
| 17 |
+
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
|
| 18 |
+
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
|
| 19 |
+
See the project's README at https://github.com/vertaix/Vendi-Score for more information.
|
| 20 |
|
| 21 |
## Metric Description
|
| 22 |
+
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
|
| 23 |
+
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
|
| 24 |
+
Specifically, given a positive semi-definite matrix $K \in \mathbb{R}^{n \times n}$ of similarity scores, the score is defined as:
|
| 25 |
+
$$\mathrm{VS}(K) = \exp(-\mathrm{tr}(K/n \log K/n)) = \exp(-\sum_{i=1}^n \lambda_i \log \lambda_i),$$
|
| 26 |
+
where $\lambda_i$ are the eigenvalues of $K/n$ and $0 \log 0 = 0$.
|
| 27 |
+
That is, the Vendi Score is equal to the exponential of the von Neumann entropy of $K/n$, or the Shannon entropy of the eigenvalues, which is also known as the effective rank.
|
| 28 |
|
| 29 |
## How to Use
|
| 30 |
+
The Vendi Score is available as a Python package or in HuggingFace `evaluate`.
|
| 31 |
+
To use the Python package, see the instructions at https://github.com/vertaix/Vendi-Score.
|
| 32 |
+
To use the `evaluate` module, pass a list of samples and a similarity function or a string identifying a predefined class of similarity functions (see below).
|
| 33 |
|
| 34 |
+
```
|
| 35 |
+
>>> vendiscore = evaluate.load("danf0/vendiscore")
|
| 36 |
+
>>> samples = ["Look, Jane.",
|
| 37 |
+
"See Spot.",
|
| 38 |
+
"See Spot run.",
|
| 39 |
+
"Run, Spot, run.",
|
| 40 |
+
"Jane sees Spot run."]
|
| 41 |
+
>>> results = vendiscore.compute(samples, k="ngram_overlap", ns=[1, 2])
|
| 42 |
+
>>> print(results)
|
| 43 |
+
{'VS': 3.90657...}
|
| 44 |
+
```
|
| 45 |
|
| 46 |
### Inputs
|
| 47 |
+
- **samples**: an iterable containing $n$ samples to score; an n x n similarity
|
| 48 |
+
matrix K, or an n x d feature matrix X.
|
| 49 |
+
- **k**: a pairwise similarity function, or a string identifying a predefined
|
| 50 |
+
similarity function. If k is a pairwise similarity function, it should
|
| 51 |
+
be symmetric and k(x, x) = 1.
|
| 52 |
+
Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
|
| 53 |
+
- **score_K**: if true, samples is an n x n similarity matrix K.
|
| 54 |
+
- **score_X**: if true, samples is an n x d feature matrix X.
|
| 55 |
+
- **score_dual**: if true, samples is an n x d feature matrix X and we will
|
| 56 |
+
compute the diversity score using the covariance matrix X @ X.T.
|
| 57 |
+
- **normalize**: if true, normalize the similarity scores.
|
| 58 |
+
- **model (optional)**: if k is "text_embeddings", a model mapping sentences to
|
| 59 |
+
embeddings (output should be an object with an attribute called
|
| 60 |
+
`pooler_output` or `last_hidden_state`). If k is "image_embeddings", a
|
| 61 |
+
model mapping images to embeddings.
|
| 62 |
+
- **tokenizer (optional)**: if k is "text_embeddings" or "ngram_overlap", a
|
| 63 |
+
tokenizer mapping strings to lists.
|
| 64 |
+
- **transform (optional)**: if k is "image_embeddings", a torchvision transform
|
| 65 |
+
to apply to the samples.
|
| 66 |
+
- **model_path (optional)**: if k is "text_embeddings", the name of a model on
|
| 67 |
+
the HuggingFace hub.
|
| 68 |
+
- **ns (optional)**: if k is "ngram_overlap", the values of n to calculate.
|
| 69 |
+
- **batch_size (optional)**: batch size to use if k is "text_embedding" or
|
| 70 |
+
"image_embedding".
|
| 71 |
+
- **device (optional)**: a string (e.g. "cuda", "cpu") or torch.device
|
| 72 |
+
identifying the device to use if k is "text_embedding"
|
| 73 |
+
or "image_embedding".
|
| 74 |
+
|
| 75 |
|
| 76 |
### Output Values
|
| 77 |
|
| 78 |
+
The output is a dictionary with one key, "VS".
|
| 79 |
+
Given n samples, the value of the Vendi Score ranges between 1 and n, with higher numbers indicating that the sample is more diverse.
|
| 80 |
+
|
| 81 |
+
### Examples
|
| 82 |
|
| 83 |
+
```python
|
| 84 |
+
import numpy as np
|
| 85 |
+
vendiscore = evaluate.load("danf0/vendiscore")
|
| 86 |
|
| 87 |
+
samples = [0, 0, 10, 10, 20, 20]
|
| 88 |
+
k = lambda a, b: np.exp(-np.abs(a - b))
|
| 89 |
|
| 90 |
+
vendiscore.compute(samples, k)
|
| 91 |
+
|
| 92 |
+
# 2.9999
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
If you already have precomputed a similarity matrix:
|
| 96 |
+
```python
|
| 97 |
+
K = np.array([[1.0, 0.9, 0.0],
|
| 98 |
+
[0.9, 1.0, 0.0],
|
| 99 |
+
[0.0, 0.0, 1.0]])
|
| 100 |
+
vendiscore.compute(K, score_K=True)
|
| 101 |
+
|
| 102 |
+
# 2.1573
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
If your similarity function is a dot product between normalized
|
| 106 |
+
embeddings $X\in\mathbb{R}^{n\times d}$, and $d < n$, it is faster
|
| 107 |
+
to compute the Vendi Score using the covariance matrix,
|
| 108 |
+
$\frac{1}{n} \sum_i x_i x_i^{\top}$:
|
| 109 |
+
```python
|
| 110 |
+
vendiscore.compute(X, score_dual=True)
|
| 111 |
+
```
|
| 112 |
+
If the rows of $X$ are not normalized, set `normalize = True`.
|
| 113 |
+
|
| 114 |
+
Images:
|
| 115 |
+
```python
|
| 116 |
+
from torchvision import datasets
|
| 117 |
+
|
| 118 |
+
mnist = datasets.MNIST("data/mnist", train=False, download=True)
|
| 119 |
+
digits = [[x for x, y in mnist if y == c] for c in range(10)]
|
| 120 |
+
pixel_vs = [vendiscore.compute(imgs, k="pixels") for imgs in digits]
|
| 121 |
+
# The default embeddings are from the pool-2048 layer of the torchvision
|
| 122 |
+
# Inception v3 model.
|
| 123 |
+
inception_vs = [vendiscore.compute(imgs, k="image_embeddings", batch_size=64, device="cuda") for imgs in digits]
|
| 124 |
+
for y, (pvs, ivs) in enumerate(zip(pixel_vs, inception_vs)): print(f"{y}\t{pvs:.02f}\t{ivs:02f}")
|
| 125 |
+
|
| 126 |
+
# Output:
|
| 127 |
+
# 0 7.68 3.45
|
| 128 |
+
# 1 5.31 3.50
|
| 129 |
+
# 2 12.18 3.62
|
| 130 |
+
# 3 9.97 2.97
|
| 131 |
+
# 4 11.10 3.75
|
| 132 |
+
# 5 13.51 3.16
|
| 133 |
+
# 6 9.06 3.63
|
| 134 |
+
# 7 9.58 4.07
|
| 135 |
+
# 8 9.69 3.74
|
| 136 |
+
# 9 8.56 3.43
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
Text:
|
| 140 |
+
```python
|
| 141 |
+
sents = ["Look, Jane.",
|
| 142 |
+
"See Spot.",
|
| 143 |
+
"See Spot run.",
|
| 144 |
+
"Run, Spot, run.",
|
| 145 |
+
"Jane sees Spot run."]
|
| 146 |
+
ngram_vs = vendiscore.compute(sents, k="ngram_overlap", ns=[1, 2])
|
| 147 |
+
bert_vs = vendiscore.compute(sents, k="text_embeddings", model_path="bert-base-uncased")
|
| 148 |
+
simcse_vs = vendiscore.compute(sents, k="text_embeddings", model_path="princeton-nlp/unsup-simcse-bert-base-uncased")
|
| 149 |
+
print(f"N-grams: {ngram_vs:.02f}, BERT: {bert_vs:.02f}, SimCSE: {simcse_vs:.02f})
|
| 150 |
+
|
| 151 |
+
# N-grams: 3.91, BERT: 1.21, SimCSE: 2.81
|
| 152 |
+
```
|
| 153 |
|
| 154 |
## Limitations and Bias
|
| 155 |
+
The Vendi Score depends on the choice of similarity function. Care should be taken to select a similarity function that reflects the features that are relevant for defining diversity in a given application.
|
| 156 |
|
| 157 |
## Citation
|
|
|
|
| 158 |
|
|
|
|
|
|
vendiscore.py
CHANGED
|
@@ -22,15 +22,17 @@ from vendi_score import vendi, image_utils, text_utils
|
|
| 22 |
# TODO: Add BibTeX citation
|
| 23 |
_CITATION = ""
|
| 24 |
_DESCRIPTION = """\
|
| 25 |
-
|
|
|
|
|
|
|
| 26 |
"""
|
| 27 |
|
| 28 |
|
| 29 |
_KWARGS_DESCRIPTION = """
|
| 30 |
Calculates the Vendi Score given samples and a similarity function.
|
| 31 |
Args:
|
| 32 |
-
samples:
|
| 33 |
-
an n x d feature matrix X.
|
| 34 |
k: a pairwise similarity function, or a string identifying a predefined
|
| 35 |
similarity function.
|
| 36 |
Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
|
|
@@ -56,7 +58,7 @@ Args:
|
|
| 56 |
Returns:
|
| 57 |
VS: The Vendi Score.
|
| 58 |
Examples:
|
| 59 |
-
>>> vendi_score = evaluate.load("
|
| 60 |
>>> samples = ["Look, Jane.",
|
| 61 |
"See Spot.",
|
| 62 |
"See Spot run.",
|
|
|
|
| 22 |
# TODO: Add BibTeX citation
|
| 23 |
_CITATION = ""
|
| 24 |
_DESCRIPTION = """\
|
| 25 |
+
The Vendi Score is a metric for evaluating diversity in machine learning.
|
| 26 |
+
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
|
| 27 |
+
See the project's README at https://github.com/vertaix/Vendi-Score for more information.
|
| 28 |
"""
|
| 29 |
|
| 30 |
|
| 31 |
_KWARGS_DESCRIPTION = """
|
| 32 |
Calculates the Vendi Score given samples and a similarity function.
|
| 33 |
Args:
|
| 34 |
+
samples: an iterable containing n samples to score, an n x n similarity
|
| 35 |
+
matrix K, or an n x d feature matrix X.
|
| 36 |
k: a pairwise similarity function, or a string identifying a predefined
|
| 37 |
similarity function.
|
| 38 |
Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
|
|
|
|
| 58 |
Returns:
|
| 59 |
VS: The Vendi Score.
|
| 60 |
Examples:
|
| 61 |
+
>>> vendi_score = evaluate.load("danf0/vendiscore")
|
| 62 |
>>> samples = ["Look, Jane.",
|
| 63 |
"See Spot.",
|
| 64 |
"See Spot run.",
|