Buckets:

rtrm's picture
|
download
raw
47.4 kB
# Metrics
## Metrics
[//]: # (TODO: aenum.Enum raises error when generating docs: not supported by inspect.signature. See: https://github.com/ethanfurman/aenum/issues/44)
[//]: # (### Metrics)
[//]: # ([[autodoc]] metrics.metrics.Metrics)
### Metric[[lighteval.metrics.Metric]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.Metric</name><anchor>lighteval.metrics.Metric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L33</source><parameters>[{"name": "metric_name", "val": ": str"}, {"name": "higher_is_better", "val": ": bool"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": typing.Union[lighteval.metrics.metrics_corpus.CorpusLevelComputation, typing.Callable]"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring>
</div>
### CorpusLevelMetric[[lighteval.metrics.utils.metric_utils.CorpusLevelMetric]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.utils.metric_utils.CorpusLevelMetric</name><anchor>lighteval.metrics.utils.metric_utils.CorpusLevelMetric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L114</source><parameters>[{"name": "metric_name", "val": ": str"}, {"name": "higher_is_better", "val": ": bool"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": typing.Union[lighteval.metrics.metrics_corpus.CorpusLevelComputation, typing.Callable]"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring>
Metric computed over the whole corpora, with computations happening at the aggregation phase
</div>
### SampleLevelMetric[[lighteval.metrics.utils.metric_utils.SampleLevelMetric]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.utils.metric_utils.SampleLevelMetric</name><anchor>lighteval.metrics.utils.metric_utils.SampleLevelMetric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L121</source><parameters>[{"name": "metric_name", "val": ": str"}, {"name": "higher_is_better", "val": ": bool"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": typing.Union[lighteval.metrics.metrics_corpus.CorpusLevelComputation, typing.Callable]"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring>
Metric computed per sample, then aggregated over the corpus
</div>
### MetricGrouping[[lighteval.metrics.utils.metric_utils.MetricGrouping]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.utils.metric_utils.MetricGrouping</name><anchor>lighteval.metrics.utils.metric_utils.MetricGrouping</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L103</source><parameters>[{"name": "metric_name", "val": ": list"}, {"name": "higher_is_better", "val": ": dict"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": dict"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring>
Some metrics are more advantageous to compute together at once.
For example, if a costly preprocessing is the same for all metrics, it makes more sense to compute it once.
</div>
### CorpusLevelMetricGrouping[[lighteval.metrics.utils.metric_utils.CorpusLevelMetricGrouping]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.utils.metric_utils.CorpusLevelMetricGrouping</name><anchor>lighteval.metrics.utils.metric_utils.CorpusLevelMetricGrouping</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L128</source><parameters>[{"name": "metric_name", "val": ": list"}, {"name": "higher_is_better", "val": ": dict"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": dict"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring>
MetricGrouping computed over the whole corpora, with computations happening at the aggregation phase
</div>
### SampleLevelMetricGrouping[[lighteval.metrics.utils.metric_utils.SampleLevelMetricGrouping]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.utils.metric_utils.SampleLevelMetricGrouping</name><anchor>lighteval.metrics.utils.metric_utils.SampleLevelMetricGrouping</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L135</source><parameters>[{"name": "metric_name", "val": ": list"}, {"name": "higher_is_better", "val": ": dict"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": dict"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring>
MetricGrouping are computed per sample, then aggregated over the corpus
</div>
## Corpus Metrics
### CorpusLevelF1Score[[lighteval.metrics.metrics_corpus.CorpusLevelF1Score]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_corpus.CorpusLevelF1Score</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelF1Score</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L81</source><parameters>[{"name": "average", "val": ": str"}, {"name": "num_classes", "val": ": int = 2"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute_corpus</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelF1Score.compute_corpus</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L96</source><parameters>[{"name": "items", "val": ": list"}]</parameters></docstring>
Computes the metric score over all the corpus generated items, by using the scikit learn implementation.
</div></div>
### CorpusLevelPerplexityMetric[[lighteval.metrics.metrics_corpus.CorpusLevelPerplexityMetric]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_corpus.CorpusLevelPerplexityMetric</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelPerplexityMetric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L164</source><parameters>[{"name": "metric_type", "val": ": str"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute_corpus</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelPerplexityMetric.compute_corpus</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L182</source><parameters>[{"name": "items", "val": ": list"}]</parameters></docstring>
Computes the metric score over all the corpus generated items.
</div></div>
### CorpusLevelTranslationMetric[[lighteval.metrics.metrics_corpus.CorpusLevelTranslationMetric]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_corpus.CorpusLevelTranslationMetric</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelTranslationMetric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L116</source><parameters>[{"name": "metric_type", "val": ": str"}, {"name": "lang", "val": ": typing.Literal['zh', 'ja', 'ko', ''] = ''"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute_corpus</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelTranslationMetric.compute_corpus</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L142</source><parameters>[{"name": "items", "val": ": list"}]</parameters></docstring>
Computes the metric score over all the corpus generated items, by using the sacrebleu implementation.
</div></div>
### MatthewsCorrCoef[[lighteval.metrics.metrics_corpus.MatthewsCorrCoef]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_corpus.MatthewsCorrCoef</name><anchor>lighteval.metrics.metrics_corpus.MatthewsCorrCoef</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L66</source><parameters>[]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute_corpus</name><anchor>lighteval.metrics.metrics_corpus.MatthewsCorrCoef.compute_corpus</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L67</source><parameters>[{"name": "items", "val": ": list"}]</parameters><paramsdesc>- **items** (list[dict]) -- List of GenerativeCorpusMetricInput</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Score</retdesc></docstring>
Computes the Matthews Correlation Coefficient, using scikit learn ([doc](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html)).
</div></div>
## Sample Metrics
### ExactMatches[[lighteval.metrics.metrics_sample.ExactMatches]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.ExactMatches</name><anchor>lighteval.metrics.metrics_sample.ExactMatches</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L81</source><parameters>[{"name": "aggregation_function", "val": ": typing.Callable[[list[float]], float] = <built-in function max>"}, {"name": "normalize_gold", "val": ": typing.Optional[typing.Callable[[str], str]] = None"}, {"name": "normalize_pred", "val": ": typing.Optional[typing.Callable[[str], str]] = None"}, {"name": "strip_strings", "val": ": bool = False"}, {"name": "type_exact_match", "val": ": str = 'full'"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.ExactMatches.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L118</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Aggregated score over the current sample's items.</retdesc></docstring>
Computes the metric over a list of golds and predictions for one single sample.
</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute_one_item</name><anchor>lighteval.metrics.metrics_sample.ExactMatches.compute_one_item</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L137</source><parameters>[{"name": "gold", "val": ": str"}, {"name": "pred", "val": ": str"}]</parameters><paramsdesc>- **gold** (str) -- One of the possible references
- **pred** (str) -- One of the possible predictions</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>The exact match score. Will be 1 for a match, 0 otherwise.</retdesc></docstring>
Compares two strings only.
</div></div>
### F1_score[[lighteval.metrics.metrics_sample.F1_score]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.F1_score</name><anchor>lighteval.metrics.metrics_sample.F1_score</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L170</source><parameters>[{"name": "aggregation_function", "val": ": typing.Callable[[list[float]], float] = <built-in function max>"}, {"name": "normalize_gold", "val": ": typing.Optional[typing.Callable[[str], str]] = None"}, {"name": "normalize_pred", "val": ": typing.Optional[typing.Callable[[str], str]] = None"}, {"name": "strip_strings", "val": ": bool = False"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.F1_score.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L197</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Aggregated score over the current sample's items.</retdesc></docstring>
Computes the metric over a list of golds and predictions for one single sample.
</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute_one_item</name><anchor>lighteval.metrics.metrics_sample.F1_score.compute_one_item</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L217</source><parameters>[{"name": "gold", "val": ": str"}, {"name": "pred", "val": ": str"}]</parameters><paramsdesc>- **gold** (str) -- One of the possible references
- **pred** (str) -- One of the possible predictions</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>The f1 score over the bag of words, computed using nltk.</retdesc></docstring>
Compares two strings only.
</div></div>
### LoglikelihoodAcc[[lighteval.metrics.metrics_sample.LoglikelihoodAcc]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.LoglikelihoodAcc</name><anchor>lighteval.metrics.metrics_sample.LoglikelihoodAcc</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L243</source><parameters>[{"name": "logprob_normalization", "val": ": lighteval.metrics.normalizations.LogProbCharNorm | lighteval.metrics.normalizations.LogProbTokenNorm | lighteval.metrics.normalizations.LogProbPMINorm | None = None"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.LoglikelihoodAcc.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L254</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing choices and gold indices.
- **model_response** (ModelResponse) -- The model's response containing logprobs.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>int</rettype><retdesc>The eval score: 1 if the best log-prob choice is in gold, 0 otherwise.</retdesc></docstring>
Computes the log likelihood accuracy: is the choice with the highest logprob in `choices_logprob` present
in the `gold_ixs`?
</div></div>
### NormalizedMultiChoiceProbability[[lighteval.metrics.metrics_sample.NormalizedMultiChoiceProbability]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.NormalizedMultiChoiceProbability</name><anchor>lighteval.metrics.metrics_sample.NormalizedMultiChoiceProbability</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L297</source><parameters>[{"name": "log_prob_normalization", "val": ": lighteval.metrics.normalizations.LogProbCharNorm | lighteval.metrics.normalizations.LogProbTokenNorm | lighteval.metrics.normalizations.LogProbPMINorm | None = None"}, {"name": "aggregation_function", "val": ": typing.Callable[[numpy.ndarray], float] = <function max at 0x7f179aceeff0>"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.NormalizedMultiChoiceProbability.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L313</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing choices and gold indices.
- **model_response** (ModelResponse) -- The model's response containing logprobs.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>The probability of the best log-prob choice being a gold choice.</retdesc></docstring>
Computes the log likelihood probability: chance of choosing the best choice.
</div></div>
### Probability[[lighteval.metrics.metrics_sample.Probability]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.Probability</name><anchor>lighteval.metrics.metrics_sample.Probability</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L357</source><parameters>[{"name": "normalization", "val": ": lighteval.metrics.normalizations.LogProbTokenNorm | None = None"}, {"name": "aggregation_function", "val": ": typing.Callable[[numpy.ndarray], float] = <function max at 0x7f179aceeff0>"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.Probability.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L373</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing choices and gold indices.
- **model_response** (ModelResponse) -- The model's response containing logprobs.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>The probability of the best log-prob choice being a gold choice.</retdesc></docstring>
Computes the log likelihood probability: chance of choosing the best choice.
</div></div>
### Recall[[lighteval.metrics.metrics_sample.Recall]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.Recall</name><anchor>lighteval.metrics.metrics_sample.Recall</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L408</source><parameters>[{"name": "k", "val": ": int"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.Recall.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L418</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing choices and gold indices.
- **model_response** (ModelResponse) -- The model's response containing logprobs.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>int</rettype><retdesc>Score: 1 if one of the top level predicted choices was correct, 0 otherwise.</retdesc></docstring>
Computes the recall at the requested depth level: looks at the `n` best predicted choices (with the
highest log probabilities) and see if there is an actual gold among them.
</div></div>
### MRR[[lighteval.metrics.metrics_sample.MRR]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.MRR</name><anchor>lighteval.metrics.metrics_sample.MRR</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L438</source><parameters>[{"name": "length_normalization", "val": ": bool = False"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.MRR.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L447</source><parameters>[{"name": "model_response", "val": ": ModelResponse"}, {"name": "doc", "val": ": Doc"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **model_response** (ModelResponse) -- The model's response containing logprobs.
- **doc** (Doc) -- The document containing choices and gold indices.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>MRR score.</retdesc></docstring>
Mean reciprocal rank. Measures the quality of a ranking of choices (ordered by correctness).
</div></div>
### ROUGE[[lighteval.metrics.metrics_sample.ROUGE]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.ROUGE</name><anchor>lighteval.metrics.metrics_sample.ROUGE</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L486</source><parameters>[{"name": "methods", "val": ": str | list[str]"}, {"name": "multiple_golds", "val": ": bool = False"}, {"name": "bootstrap", "val": ": bool = False"}, {"name": "normalize_gold", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "normalize_pred", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "aggregation_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "tokenizer", "val": ": object = None"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.ROUGE.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L533</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float or dict</rettype><retdesc>Aggregated score over the current sample's items.
If several rouge functions have been selected, returns a dict which maps name and scores.</retdesc></docstring>
Computes the metric(s) over a list of golds and predictions for one single sample.
</div></div>
### BertScore[[lighteval.metrics.metrics_sample.BertScore]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.BertScore</name><anchor>lighteval.metrics.metrics_sample.BertScore</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L598</source><parameters>[{"name": "normalize_gold", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "normalize_pred", "val": ": typing.Optional[typing.Callable] = None"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.BertScore.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L628</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>dict</rettype><retdesc>Scores over the current sample's items.</retdesc></docstring>
Computes the prediction, recall and f1 score using the bert scorer.
</div></div>
### Extractiveness[[lighteval.metrics.metrics_sample.Extractiveness]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.Extractiveness</name><anchor>lighteval.metrics.metrics_sample.Extractiveness</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L661</source><parameters>[{"name": "normalize_input", "val": ": <built-in function callable> = <function remove_braces at 0x7f1693a1eb00>"}, {"name": "normalize_pred", "val": ": <built-in function callable> = <function remove_braces_and_strip at 0x7f1693a1eb90>"}, {"name": "input_column", "val": ": str = 'text'"}, {"name": "language", "val": ": typing.Literal['en', 'de', 'fr', 'it'] = 'en'"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.Extractiveness.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L685</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing input text.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>dict[str, float]</rettype><retdesc>The extractiveness scores.</retdesc></docstring>
Compute the extractiveness of the predictions.
This method calculates coverage, density, and compression scores for a single
prediction against the input text.
</div></div>
### Faithfulness[[lighteval.metrics.metrics_sample.Faithfulness]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.Faithfulness</name><anchor>lighteval.metrics.metrics_sample.Faithfulness</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L717</source><parameters>[{"name": "normalize_input", "val": ": typing.Callable = <function remove_braces at 0x7f1693a1eb00>"}, {"name": "normalize_pred", "val": ": typing.Callable = <function remove_braces_and_strip at 0x7f1693a1eb90>"}, {"name": "input_column", "val": ": str = 'text'"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.Faithfulness.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L738</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing input text.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>dict[str, float]</rettype><retdesc>The faithfulness scores.</retdesc></docstring>
Compute the faithfulness of the predictions.
The SummaCZS (Summary Content Zero-Shot) model is used with configurable granularity and model variation.
</div></div>
### BLEURT[[lighteval.metrics.metrics_sample.BLEURT]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.BLEURT</name><anchor>lighteval.metrics.metrics_sample.BLEURT</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L765</source><parameters>[]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.BLEURT.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L786</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Score over the current sample's items.</retdesc></docstring>
Uses the stored BLEURT scorer to compute the score on the current sample.
</div></div>
### BLEU[[lighteval.metrics.metrics_sample.BLEU]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.BLEU</name><anchor>lighteval.metrics.metrics_sample.BLEU</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L805</source><parameters>[{"name": "n_gram", "val": ": int"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.BLEU.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L815</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Score over the current sample's items.</retdesc></docstring>
Computes the sentence level BLEU between the golds and each prediction, then takes the average.
</div></div>
### StringDistance[[lighteval.metrics.metrics_sample.StringDistance]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.StringDistance</name><anchor>lighteval.metrics.metrics_sample.StringDistance</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L847</source><parameters>[{"name": "metric_types", "val": ": list[str] | str"}, {"name": "strip_prediction", "val": ": bool = True"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.StringDistance.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L869</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>dict</rettype><retdesc>The different scores computed</retdesc></docstring>
Computes all the requested metrics on the golds and prediction.
</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>edit_similarity</name><anchor>lighteval.metrics.metrics_sample.StringDistance.edit_similarity</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L927</source><parameters>[{"name": "s1", "val": ""}, {"name": "s2", "val": ""}]</parameters><rettype>float</rettype><retdesc>Edit similarity score between 0 and 1</retdesc></docstring>
Compute the edit similarity between two lists of strings.
Edit similarity is also used in the paper
Lee, Katherine, et al.
"Deduplicating training data makes language models better."
arXiv preprint arXiv:2107.06499 (2021).
</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>longest_common_prefix_length</name><anchor>lighteval.metrics.metrics_sample.StringDistance.longest_common_prefix_length</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L920</source><parameters>[{"name": "s1", "val": ": ndarray"}, {"name": "s2", "val": ": ndarray"}]</parameters></docstring>
Compute the length of the longest common prefix.
</div></div>
### JudgeLLM[[lighteval.metrics.metrics_sample.JudgeLLM]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.JudgeLLM</name><anchor>lighteval.metrics.metrics_sample.JudgeLLM</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L942</source><parameters>[{"name": "judge_model_name", "val": ": str"}, {"name": "template", "val": ": typing.Callable"}, {"name": "process_judge_response", "val": ": typing.Callable"}, {"name": "judge_backend", "val": ": typing.Literal['litellm', 'openai', 'transformers', 'vllm', 'tgi', 'inference-providers']"}, {"name": "short_judge_name", "val": ": str | None = None"}, {"name": "response_format", "val": ": pydantic.main.BaseModel | None = None"}, {"name": "url", "val": ": str | None = None"}, {"name": "hf_provider", "val": ": str | None = None"}, {"name": "max_tokens", "val": ": int | None = None"}, {"name": "backend_options", "val": ": dict | None = None"}]</parameters></docstring>
</div>
### JudgeLLMMTBench[[lighteval.metrics.metrics_sample.JudgeLLMMTBench]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.JudgeLLMMTBench</name><anchor>lighteval.metrics.metrics_sample.JudgeLLMMTBench</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1046</source><parameters>[{"name": "judge_model_name", "val": ": str"}, {"name": "template", "val": ": typing.Callable"}, {"name": "process_judge_response", "val": ": typing.Callable"}, {"name": "judge_backend", "val": ": typing.Literal['litellm', 'openai', 'transformers', 'vllm', 'tgi', 'inference-providers']"}, {"name": "short_judge_name", "val": ": str | None = None"}, {"name": "response_format", "val": ": pydantic.main.BaseModel | None = None"}, {"name": "url", "val": ": str | None = None"}, {"name": "hf_provider", "val": ": str | None = None"}, {"name": "max_tokens", "val": ": int | None = None"}, {"name": "backend_options", "val": ": dict | None = None"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.JudgeLLMMTBench.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1047</source><parameters>[{"name": "model_response", "val": ": list"}, {"name": "docs", "val": ": list"}, {"name": "**kwargs", "val": ""}]</parameters></docstring>
Compute the score of a generative task using a llm as a judge.
The generative task can be multiturn with 2 turns max, in that case, we
return scores for turn 1 and 2. Also returns user_prompt and judgement
which are ignored later by the aggregator.
</div></div>
### JudgeLLMMixEval[[lighteval.metrics.metrics_sample.JudgeLLMMixEval]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.JudgeLLMMixEval</name><anchor>lighteval.metrics.metrics_sample.JudgeLLMMixEval</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1078</source><parameters>[{"name": "judge_model_name", "val": ": str"}, {"name": "template", "val": ": typing.Callable"}, {"name": "process_judge_response", "val": ": typing.Callable"}, {"name": "judge_backend", "val": ": typing.Literal['litellm', 'openai', 'transformers', 'vllm', 'tgi', 'inference-providers']"}, {"name": "short_judge_name", "val": ": str | None = None"}, {"name": "response_format", "val": ": pydantic.main.BaseModel | None = None"}, {"name": "url", "val": ": str | None = None"}, {"name": "hf_provider", "val": ": str | None = None"}, {"name": "max_tokens", "val": ": int | None = None"}, {"name": "backend_options", "val": ": dict | None = None"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.JudgeLLMMixEval.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1079</source><parameters>[{"name": "model_responses", "val": ": list"}, {"name": "docs", "val": ": list"}, {"name": "**kwargs", "val": ""}]</parameters></docstring>
Compute the score of a generative task using a llm as a judge.
The generative task can be multiturn with 2 turns max, in that case, we
return scores for turn 1 and 2. Also returns user_prompt and judgement
which are ignored later by the aggregator.
</div></div>
### MajAtK[[lighteval.metrics.metrics_sample.MajAtK]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.metrics_sample.MajAtK</name><anchor>lighteval.metrics.metrics_sample.MajAtK</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1216</source><parameters>[{"name": "k", "val": ": int | None = None"}, {"name": "**kwargs", "val": ""}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.MajAtK.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1229</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references.
- **model_response** (ModelResponse) -- The model's response containing predictions.
- ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Aggregated score over the current sample's items.</retdesc></docstring>
Computes the metric over a list of golds and predictions for one single sample.
It applies normalisation (if needed) to model prediction and gold, and takes the most frequent answer of all the available ones, then compares it to the gold.
</div></div>
## LLM-as-a-Judge
### JudgeLM[[lighteval.metrics.utils.llm_as_judge.JudgeLM]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class lighteval.metrics.utils.llm_as_judge.JudgeLM</name><anchor>lighteval.metrics.utils.llm_as_judge.JudgeLM</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/llm_as_judge.py#L67</source><parameters>[{"name": "model", "val": ": str"}, {"name": "templates", "val": ": typing.Callable"}, {"name": "process_judge_response", "val": ": typing.Callable"}, {"name": "judge_backend", "val": ": typing.Literal['litellm', 'openai', 'transformers', 'tgi', 'vllm', 'inference-providers']"}, {"name": "url", "val": ": str | None = None"}, {"name": "api_key", "val": ": str | None = None"}, {"name": "max_tokens", "val": ": int = 512"}, {"name": "response_format", "val": ": BaseModel = None"}, {"name": "hf_provider", "val": ": typing.Optional[typing.Literal['black-forest-labs', 'cerebras', 'cohere', 'fal-ai', 'fireworks-ai', 'inference-providers', 'hyperbolic', 'nebius', 'novita', 'openai', 'replicate', 'sambanova', 'together']] = None"}, {"name": "backend_options", "val": ": dict | None = None"}]</parameters><paramsdesc>- **model** (str) -- The name of the model.
- **templates** (Callable) -- A function taking into account the question, options, answer, and gold and returning the judge prompt.
- **process_judge_response** (Callable) -- A function for processing the judge's response.
- **judge_backend** (Literal["litellm", "openai", "transformers", "tgi", "vllm", "inference-providers"]) -- The backend for the judge.
- **url** (str | None) -- The URL for the OpenAI API.
- **api_key** (str | None) -- The API key for the OpenAI API (either OpenAI or HF key).
- **max_tokens** (int) -- The maximum number of tokens to generate. Defaults to 512.
- **response_format** (BaseModel | None) -- The format of the response from the API, used for the OpenAI and TGI backend.
- **hf_provider** (Literal["black-forest-labs", "cerebras", "cohere", "fal-ai", "fireworks-ai", --
"inference-providers", "hyperbolic", "nebius", "novita", "openai", "replicate", "sambanova", "together"] | None):
The HuggingFace provider when using the inference-providers backend.
- **backend_options** (dict | None) -- Options for the backend. Currently only supported for litellm.</paramsdesc><paramgroups>0</paramgroups></docstring>
A class representing a judge for evaluating answers using either the chosen backend.
Methods:
evaluate_answer: Evaluates an answer using the OpenAI API or Transformers library.
__lazy_load_client: Lazy loads the OpenAI client or Transformers pipeline.
__call_api: Calls the API to get the judge's response.
__call_transformers: Calls the Transformers pipeline to get the judge's response.
__call_vllm: Calls the VLLM pipeline to get the judge's response.
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>dict_of_lists_to_list_of_dicts</name><anchor>lighteval.metrics.utils.llm_as_judge.JudgeLM.dict_of_lists_to_list_of_dicts</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/llm_as_judge.py#L204</source><parameters>[{"name": "dict_of_lists", "val": ""}]</parameters><paramsdesc>- **dict_of_lists** -- A dictionary where each value is a list.
All lists are expected to have the same length.</paramsdesc><paramgroups>0</paramgroups><retdesc>A list of dictionaries.</retdesc></docstring>
Transform a dictionary of lists into a list of dictionaries.
Each dictionary in the output list will contain one element from each list in the input dictionary,
with the same keys as the input dictionary.
Example:
>>> dict_of_lists_to_list_of_dicts({'k': [1, 2, 3], 'k2': ['a', 'b', 'c']})
[{'k': 1, 'k2': 'a'}, {'k': 2, 'k2': 'b'}, {'k': 3, 'k2': 'c'}]
</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>evaluate_answer</name><anchor>lighteval.metrics.utils.llm_as_judge.JudgeLM.evaluate_answer</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/llm_as_judge.py#L272</source><parameters>[{"name": "question", "val": ": str"}, {"name": "answer", "val": ": str"}, {"name": "options", "val": ": list[str] | None = None"}, {"name": "gold", "val": ": str | None = None"}]</parameters><paramsdesc>- **question** (str) -- The prompt asked to the evaluated model.
- **answer** (str) -- Answer given by the evaluated model.
- **options** (list[str] | None) -- Optional list of answer options.
- **gold** (str | None) -- Optional reference answer.</paramsdesc><paramgroups>0</paramgroups><retdesc>A tuple containing the score, prompts, and judgment.</retdesc></docstring>
Evaluates an answer using either Transformers or OpenAI API.
</div></div>
<EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/package_reference/metrics.mdx" />

Xet Storage Details

Size:
47.4 kB
·
Xet hash:
7ea265ad7d26f4b5bb18516f2d33008c3bc2472f9b4f6892d505c68331c4051f

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.