Buckets:
| # Metrics | |
| ## Metrics | |
| [//]: # (TODO: aenum.Enum raises error when generating docs: not supported by inspect.signature. See: https://github.com/ethanfurman/aenum/issues/44) | |
| [//]: # (### Metrics) | |
| [//]: # ([[autodoc]] metrics.metrics.Metrics) | |
| ### Metric[[lighteval.metrics.Metric]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.Metric</name><anchor>lighteval.metrics.Metric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L33</source><parameters>[{"name": "metric_name", "val": ": str"}, {"name": "higher_is_better", "val": ": bool"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": typing.Union[lighteval.metrics.metrics_corpus.CorpusLevelComputation, typing.Callable]"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring> | |
| </div> | |
| ### CorpusLevelMetric[[lighteval.metrics.utils.metric_utils.CorpusLevelMetric]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.utils.metric_utils.CorpusLevelMetric</name><anchor>lighteval.metrics.utils.metric_utils.CorpusLevelMetric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L114</source><parameters>[{"name": "metric_name", "val": ": str"}, {"name": "higher_is_better", "val": ": bool"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": typing.Union[lighteval.metrics.metrics_corpus.CorpusLevelComputation, typing.Callable]"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring> | |
| Metric computed over the whole corpora, with computations happening at the aggregation phase | |
| </div> | |
| ### SampleLevelMetric[[lighteval.metrics.utils.metric_utils.SampleLevelMetric]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.utils.metric_utils.SampleLevelMetric</name><anchor>lighteval.metrics.utils.metric_utils.SampleLevelMetric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L121</source><parameters>[{"name": "metric_name", "val": ": str"}, {"name": "higher_is_better", "val": ": bool"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": typing.Union[lighteval.metrics.metrics_corpus.CorpusLevelComputation, typing.Callable]"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring> | |
| Metric computed per sample, then aggregated over the corpus | |
| </div> | |
| ### MetricGrouping[[lighteval.metrics.utils.metric_utils.MetricGrouping]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.utils.metric_utils.MetricGrouping</name><anchor>lighteval.metrics.utils.metric_utils.MetricGrouping</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L103</source><parameters>[{"name": "metric_name", "val": ": list"}, {"name": "higher_is_better", "val": ": dict"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": dict"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring> | |
| Some metrics are more advantageous to compute together at once. | |
| For example, if a costly preprocessing is the same for all metrics, it makes more sense to compute it once. | |
| </div> | |
| ### CorpusLevelMetricGrouping[[lighteval.metrics.utils.metric_utils.CorpusLevelMetricGrouping]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.utils.metric_utils.CorpusLevelMetricGrouping</name><anchor>lighteval.metrics.utils.metric_utils.CorpusLevelMetricGrouping</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L128</source><parameters>[{"name": "metric_name", "val": ": list"}, {"name": "higher_is_better", "val": ": dict"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": dict"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring> | |
| MetricGrouping computed over the whole corpora, with computations happening at the aggregation phase | |
| </div> | |
| ### SampleLevelMetricGrouping[[lighteval.metrics.utils.metric_utils.SampleLevelMetricGrouping]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.utils.metric_utils.SampleLevelMetricGrouping</name><anchor>lighteval.metrics.utils.metric_utils.SampleLevelMetricGrouping</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/metric_utils.py#L135</source><parameters>[{"name": "metric_name", "val": ": list"}, {"name": "higher_is_better", "val": ": dict"}, {"name": "category", "val": ": SamplingMethod"}, {"name": "sample_level_fn", "val": ": lighteval.metrics.metrics_sample.SampleLevelComputation | lighteval.metrics.sample_preparator.Preparator"}, {"name": "corpus_level_fn", "val": ": dict"}, {"name": "batched_compute", "val": ": bool = False"}]</parameters></docstring> | |
| MetricGrouping are computed per sample, then aggregated over the corpus | |
| </div> | |
| ## Corpus Metrics | |
| ### CorpusLevelF1Score[[lighteval.metrics.metrics_corpus.CorpusLevelF1Score]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_corpus.CorpusLevelF1Score</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelF1Score</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L81</source><parameters>[{"name": "average", "val": ": str"}, {"name": "num_classes", "val": ": int = 2"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute_corpus</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelF1Score.compute_corpus</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L96</source><parameters>[{"name": "items", "val": ": list"}]</parameters></docstring> | |
| Computes the metric score over all the corpus generated items, by using the scikit learn implementation. | |
| </div></div> | |
| ### CorpusLevelPerplexityMetric[[lighteval.metrics.metrics_corpus.CorpusLevelPerplexityMetric]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_corpus.CorpusLevelPerplexityMetric</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelPerplexityMetric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L164</source><parameters>[{"name": "metric_type", "val": ": str"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute_corpus</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelPerplexityMetric.compute_corpus</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L182</source><parameters>[{"name": "items", "val": ": list"}]</parameters></docstring> | |
| Computes the metric score over all the corpus generated items. | |
| </div></div> | |
| ### CorpusLevelTranslationMetric[[lighteval.metrics.metrics_corpus.CorpusLevelTranslationMetric]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_corpus.CorpusLevelTranslationMetric</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelTranslationMetric</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L116</source><parameters>[{"name": "metric_type", "val": ": str"}, {"name": "lang", "val": ": typing.Literal['zh', 'ja', 'ko', ''] = ''"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute_corpus</name><anchor>lighteval.metrics.metrics_corpus.CorpusLevelTranslationMetric.compute_corpus</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L142</source><parameters>[{"name": "items", "val": ": list"}]</parameters></docstring> | |
| Computes the metric score over all the corpus generated items, by using the sacrebleu implementation. | |
| </div></div> | |
| ### MatthewsCorrCoef[[lighteval.metrics.metrics_corpus.MatthewsCorrCoef]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_corpus.MatthewsCorrCoef</name><anchor>lighteval.metrics.metrics_corpus.MatthewsCorrCoef</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L66</source><parameters>[]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute_corpus</name><anchor>lighteval.metrics.metrics_corpus.MatthewsCorrCoef.compute_corpus</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_corpus.py#L67</source><parameters>[{"name": "items", "val": ": list"}]</parameters><paramsdesc>- **items** (list[dict]) -- List of GenerativeCorpusMetricInput</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Score</retdesc></docstring> | |
| Computes the Matthews Correlation Coefficient, using scikit learn ([doc](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html)). | |
| </div></div> | |
| ## Sample Metrics | |
| ### ExactMatches[[lighteval.metrics.metrics_sample.ExactMatches]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.ExactMatches</name><anchor>lighteval.metrics.metrics_sample.ExactMatches</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L81</source><parameters>[{"name": "aggregation_function", "val": ": typing.Callable[[list[float]], float] = <built-in function max>"}, {"name": "normalize_gold", "val": ": typing.Optional[typing.Callable[[str], str]] = None"}, {"name": "normalize_pred", "val": ": typing.Optional[typing.Callable[[str], str]] = None"}, {"name": "strip_strings", "val": ": bool = False"}, {"name": "type_exact_match", "val": ": str = 'full'"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.ExactMatches.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L118</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Aggregated score over the current sample's items.</retdesc></docstring> | |
| Computes the metric over a list of golds and predictions for one single sample. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute_one_item</name><anchor>lighteval.metrics.metrics_sample.ExactMatches.compute_one_item</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L137</source><parameters>[{"name": "gold", "val": ": str"}, {"name": "pred", "val": ": str"}]</parameters><paramsdesc>- **gold** (str) -- One of the possible references | |
| - **pred** (str) -- One of the possible predictions</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>The exact match score. Will be 1 for a match, 0 otherwise.</retdesc></docstring> | |
| Compares two strings only. | |
| </div></div> | |
| ### F1_score[[lighteval.metrics.metrics_sample.F1_score]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.F1_score</name><anchor>lighteval.metrics.metrics_sample.F1_score</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L170</source><parameters>[{"name": "aggregation_function", "val": ": typing.Callable[[list[float]], float] = <built-in function max>"}, {"name": "normalize_gold", "val": ": typing.Optional[typing.Callable[[str], str]] = None"}, {"name": "normalize_pred", "val": ": typing.Optional[typing.Callable[[str], str]] = None"}, {"name": "strip_strings", "val": ": bool = False"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.F1_score.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L197</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Aggregated score over the current sample's items.</retdesc></docstring> | |
| Computes the metric over a list of golds and predictions for one single sample. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute_one_item</name><anchor>lighteval.metrics.metrics_sample.F1_score.compute_one_item</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L217</source><parameters>[{"name": "gold", "val": ": str"}, {"name": "pred", "val": ": str"}]</parameters><paramsdesc>- **gold** (str) -- One of the possible references | |
| - **pred** (str) -- One of the possible predictions</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>The f1 score over the bag of words, computed using nltk.</retdesc></docstring> | |
| Compares two strings only. | |
| </div></div> | |
| ### LoglikelihoodAcc[[lighteval.metrics.metrics_sample.LoglikelihoodAcc]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.LoglikelihoodAcc</name><anchor>lighteval.metrics.metrics_sample.LoglikelihoodAcc</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L243</source><parameters>[{"name": "logprob_normalization", "val": ": lighteval.metrics.normalizations.LogProbCharNorm | lighteval.metrics.normalizations.LogProbTokenNorm | lighteval.metrics.normalizations.LogProbPMINorm | None = None"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.LoglikelihoodAcc.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L254</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing choices and gold indices. | |
| - **model_response** (ModelResponse) -- The model's response containing logprobs. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>int</rettype><retdesc>The eval score: 1 if the best log-prob choice is in gold, 0 otherwise.</retdesc></docstring> | |
| Computes the log likelihood accuracy: is the choice with the highest logprob in `choices_logprob` present | |
| in the `gold_ixs`? | |
| </div></div> | |
| ### NormalizedMultiChoiceProbability[[lighteval.metrics.metrics_sample.NormalizedMultiChoiceProbability]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.NormalizedMultiChoiceProbability</name><anchor>lighteval.metrics.metrics_sample.NormalizedMultiChoiceProbability</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L297</source><parameters>[{"name": "log_prob_normalization", "val": ": lighteval.metrics.normalizations.LogProbCharNorm | lighteval.metrics.normalizations.LogProbTokenNorm | lighteval.metrics.normalizations.LogProbPMINorm | None = None"}, {"name": "aggregation_function", "val": ": typing.Callable[[numpy.ndarray], float] = <function max at 0x7f179aceeff0>"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.NormalizedMultiChoiceProbability.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L313</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing choices and gold indices. | |
| - **model_response** (ModelResponse) -- The model's response containing logprobs. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>The probability of the best log-prob choice being a gold choice.</retdesc></docstring> | |
| Computes the log likelihood probability: chance of choosing the best choice. | |
| </div></div> | |
| ### Probability[[lighteval.metrics.metrics_sample.Probability]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.Probability</name><anchor>lighteval.metrics.metrics_sample.Probability</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L357</source><parameters>[{"name": "normalization", "val": ": lighteval.metrics.normalizations.LogProbTokenNorm | None = None"}, {"name": "aggregation_function", "val": ": typing.Callable[[numpy.ndarray], float] = <function max at 0x7f179aceeff0>"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.Probability.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L373</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing choices and gold indices. | |
| - **model_response** (ModelResponse) -- The model's response containing logprobs. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>The probability of the best log-prob choice being a gold choice.</retdesc></docstring> | |
| Computes the log likelihood probability: chance of choosing the best choice. | |
| </div></div> | |
| ### Recall[[lighteval.metrics.metrics_sample.Recall]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.Recall</name><anchor>lighteval.metrics.metrics_sample.Recall</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L408</source><parameters>[{"name": "k", "val": ": int"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.Recall.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L418</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing choices and gold indices. | |
| - **model_response** (ModelResponse) -- The model's response containing logprobs. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>int</rettype><retdesc>Score: 1 if one of the top level predicted choices was correct, 0 otherwise.</retdesc></docstring> | |
| Computes the recall at the requested depth level: looks at the `n` best predicted choices (with the | |
| highest log probabilities) and see if there is an actual gold among them. | |
| </div></div> | |
| ### MRR[[lighteval.metrics.metrics_sample.MRR]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.MRR</name><anchor>lighteval.metrics.metrics_sample.MRR</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L438</source><parameters>[{"name": "length_normalization", "val": ": bool = False"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.MRR.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L447</source><parameters>[{"name": "model_response", "val": ": ModelResponse"}, {"name": "doc", "val": ": Doc"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **model_response** (ModelResponse) -- The model's response containing logprobs. | |
| - **doc** (Doc) -- The document containing choices and gold indices. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>MRR score.</retdesc></docstring> | |
| Mean reciprocal rank. Measures the quality of a ranking of choices (ordered by correctness). | |
| </div></div> | |
| ### ROUGE[[lighteval.metrics.metrics_sample.ROUGE]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.ROUGE</name><anchor>lighteval.metrics.metrics_sample.ROUGE</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L486</source><parameters>[{"name": "methods", "val": ": str | list[str]"}, {"name": "multiple_golds", "val": ": bool = False"}, {"name": "bootstrap", "val": ": bool = False"}, {"name": "normalize_gold", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "normalize_pred", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "aggregation_function", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "tokenizer", "val": ": object = None"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.ROUGE.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L533</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float or dict</rettype><retdesc>Aggregated score over the current sample's items. | |
| If several rouge functions have been selected, returns a dict which maps name and scores.</retdesc></docstring> | |
| Computes the metric(s) over a list of golds and predictions for one single sample. | |
| </div></div> | |
| ### BertScore[[lighteval.metrics.metrics_sample.BertScore]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.BertScore</name><anchor>lighteval.metrics.metrics_sample.BertScore</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L598</source><parameters>[{"name": "normalize_gold", "val": ": typing.Optional[typing.Callable] = None"}, {"name": "normalize_pred", "val": ": typing.Optional[typing.Callable] = None"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.BertScore.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L628</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>dict</rettype><retdesc>Scores over the current sample's items.</retdesc></docstring> | |
| Computes the prediction, recall and f1 score using the bert scorer. | |
| </div></div> | |
| ### Extractiveness[[lighteval.metrics.metrics_sample.Extractiveness]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.Extractiveness</name><anchor>lighteval.metrics.metrics_sample.Extractiveness</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L661</source><parameters>[{"name": "normalize_input", "val": ": <built-in function callable> = <function remove_braces at 0x7f1693a1eb00>"}, {"name": "normalize_pred", "val": ": <built-in function callable> = <function remove_braces_and_strip at 0x7f1693a1eb90>"}, {"name": "input_column", "val": ": str = 'text'"}, {"name": "language", "val": ": typing.Literal['en', 'de', 'fr', 'it'] = 'en'"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.Extractiveness.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L685</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing input text. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>dict[str, float]</rettype><retdesc>The extractiveness scores.</retdesc></docstring> | |
| Compute the extractiveness of the predictions. | |
| This method calculates coverage, density, and compression scores for a single | |
| prediction against the input text. | |
| </div></div> | |
| ### Faithfulness[[lighteval.metrics.metrics_sample.Faithfulness]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.Faithfulness</name><anchor>lighteval.metrics.metrics_sample.Faithfulness</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L717</source><parameters>[{"name": "normalize_input", "val": ": typing.Callable = <function remove_braces at 0x7f1693a1eb00>"}, {"name": "normalize_pred", "val": ": typing.Callable = <function remove_braces_and_strip at 0x7f1693a1eb90>"}, {"name": "input_column", "val": ": str = 'text'"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.Faithfulness.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L738</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing input text. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>dict[str, float]</rettype><retdesc>The faithfulness scores.</retdesc></docstring> | |
| Compute the faithfulness of the predictions. | |
| The SummaCZS (Summary Content Zero-Shot) model is used with configurable granularity and model variation. | |
| </div></div> | |
| ### BLEURT[[lighteval.metrics.metrics_sample.BLEURT]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.BLEURT</name><anchor>lighteval.metrics.metrics_sample.BLEURT</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L765</source><parameters>[]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.BLEURT.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L786</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Score over the current sample's items.</retdesc></docstring> | |
| Uses the stored BLEURT scorer to compute the score on the current sample. | |
| </div></div> | |
| ### BLEU[[lighteval.metrics.metrics_sample.BLEU]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.BLEU</name><anchor>lighteval.metrics.metrics_sample.BLEU</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L805</source><parameters>[{"name": "n_gram", "val": ": int"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.BLEU.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L815</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Score over the current sample's items.</retdesc></docstring> | |
| Computes the sentence level BLEU between the golds and each prediction, then takes the average. | |
| </div></div> | |
| ### StringDistance[[lighteval.metrics.metrics_sample.StringDistance]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.StringDistance</name><anchor>lighteval.metrics.metrics_sample.StringDistance</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L847</source><parameters>[{"name": "metric_types", "val": ": list[str] | str"}, {"name": "strip_prediction", "val": ": bool = True"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.StringDistance.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L869</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>dict</rettype><retdesc>The different scores computed</retdesc></docstring> | |
| Computes all the requested metrics on the golds and prediction. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>edit_similarity</name><anchor>lighteval.metrics.metrics_sample.StringDistance.edit_similarity</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L927</source><parameters>[{"name": "s1", "val": ""}, {"name": "s2", "val": ""}]</parameters><rettype>float</rettype><retdesc>Edit similarity score between 0 and 1</retdesc></docstring> | |
| Compute the edit similarity between two lists of strings. | |
| Edit similarity is also used in the paper | |
| Lee, Katherine, et al. | |
| "Deduplicating training data makes language models better." | |
| arXiv preprint arXiv:2107.06499 (2021). | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>longest_common_prefix_length</name><anchor>lighteval.metrics.metrics_sample.StringDistance.longest_common_prefix_length</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L920</source><parameters>[{"name": "s1", "val": ": ndarray"}, {"name": "s2", "val": ": ndarray"}]</parameters></docstring> | |
| Compute the length of the longest common prefix. | |
| </div></div> | |
| ### JudgeLLM[[lighteval.metrics.metrics_sample.JudgeLLM]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.JudgeLLM</name><anchor>lighteval.metrics.metrics_sample.JudgeLLM</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L942</source><parameters>[{"name": "judge_model_name", "val": ": str"}, {"name": "template", "val": ": typing.Callable"}, {"name": "process_judge_response", "val": ": typing.Callable"}, {"name": "judge_backend", "val": ": typing.Literal['litellm', 'openai', 'transformers', 'vllm', 'tgi', 'inference-providers']"}, {"name": "short_judge_name", "val": ": str | None = None"}, {"name": "response_format", "val": ": pydantic.main.BaseModel | None = None"}, {"name": "url", "val": ": str | None = None"}, {"name": "hf_provider", "val": ": str | None = None"}, {"name": "max_tokens", "val": ": int | None = None"}, {"name": "backend_options", "val": ": dict | None = None"}]</parameters></docstring> | |
| </div> | |
| ### JudgeLLMMTBench[[lighteval.metrics.metrics_sample.JudgeLLMMTBench]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.JudgeLLMMTBench</name><anchor>lighteval.metrics.metrics_sample.JudgeLLMMTBench</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1046</source><parameters>[{"name": "judge_model_name", "val": ": str"}, {"name": "template", "val": ": typing.Callable"}, {"name": "process_judge_response", "val": ": typing.Callable"}, {"name": "judge_backend", "val": ": typing.Literal['litellm', 'openai', 'transformers', 'vllm', 'tgi', 'inference-providers']"}, {"name": "short_judge_name", "val": ": str | None = None"}, {"name": "response_format", "val": ": pydantic.main.BaseModel | None = None"}, {"name": "url", "val": ": str | None = None"}, {"name": "hf_provider", "val": ": str | None = None"}, {"name": "max_tokens", "val": ": int | None = None"}, {"name": "backend_options", "val": ": dict | None = None"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.JudgeLLMMTBench.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1047</source><parameters>[{"name": "model_response", "val": ": list"}, {"name": "docs", "val": ": list"}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| Compute the score of a generative task using a llm as a judge. | |
| The generative task can be multiturn with 2 turns max, in that case, we | |
| return scores for turn 1 and 2. Also returns user_prompt and judgement | |
| which are ignored later by the aggregator. | |
| </div></div> | |
| ### JudgeLLMMixEval[[lighteval.metrics.metrics_sample.JudgeLLMMixEval]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.JudgeLLMMixEval</name><anchor>lighteval.metrics.metrics_sample.JudgeLLMMixEval</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1078</source><parameters>[{"name": "judge_model_name", "val": ": str"}, {"name": "template", "val": ": typing.Callable"}, {"name": "process_judge_response", "val": ": typing.Callable"}, {"name": "judge_backend", "val": ": typing.Literal['litellm', 'openai', 'transformers', 'vllm', 'tgi', 'inference-providers']"}, {"name": "short_judge_name", "val": ": str | None = None"}, {"name": "response_format", "val": ": pydantic.main.BaseModel | None = None"}, {"name": "url", "val": ": str | None = None"}, {"name": "hf_provider", "val": ": str | None = None"}, {"name": "max_tokens", "val": ": int | None = None"}, {"name": "backend_options", "val": ": dict | None = None"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.JudgeLLMMixEval.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1079</source><parameters>[{"name": "model_responses", "val": ": list"}, {"name": "docs", "val": ": list"}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| Compute the score of a generative task using a llm as a judge. | |
| The generative task can be multiturn with 2 turns max, in that case, we | |
| return scores for turn 1 and 2. Also returns user_prompt and judgement | |
| which are ignored later by the aggregator. | |
| </div></div> | |
| ### MajAtK[[lighteval.metrics.metrics_sample.MajAtK]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.metrics_sample.MajAtK</name><anchor>lighteval.metrics.metrics_sample.MajAtK</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1216</source><parameters>[{"name": "k", "val": ": int | None = None"}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>compute</name><anchor>lighteval.metrics.metrics_sample.MajAtK.compute</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/metrics_sample.py#L1229</source><parameters>[{"name": "doc", "val": ": Doc"}, {"name": "model_response", "val": ": ModelResponse"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **doc** (Doc) -- The document containing gold references. | |
| - **model_response** (ModelResponse) -- The model's response containing predictions. | |
| - ****kwargs** -- Additional keyword arguments.</paramsdesc><paramgroups>0</paramgroups><rettype>float</rettype><retdesc>Aggregated score over the current sample's items.</retdesc></docstring> | |
| Computes the metric over a list of golds and predictions for one single sample. | |
| It applies normalisation (if needed) to model prediction and gold, and takes the most frequent answer of all the available ones, then compares it to the gold. | |
| </div></div> | |
| ## LLM-as-a-Judge | |
| ### JudgeLM[[lighteval.metrics.utils.llm_as_judge.JudgeLM]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class lighteval.metrics.utils.llm_as_judge.JudgeLM</name><anchor>lighteval.metrics.utils.llm_as_judge.JudgeLM</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/llm_as_judge.py#L67</source><parameters>[{"name": "model", "val": ": str"}, {"name": "templates", "val": ": typing.Callable"}, {"name": "process_judge_response", "val": ": typing.Callable"}, {"name": "judge_backend", "val": ": typing.Literal['litellm', 'openai', 'transformers', 'tgi', 'vllm', 'inference-providers']"}, {"name": "url", "val": ": str | None = None"}, {"name": "api_key", "val": ": str | None = None"}, {"name": "max_tokens", "val": ": int = 512"}, {"name": "response_format", "val": ": BaseModel = None"}, {"name": "hf_provider", "val": ": typing.Optional[typing.Literal['black-forest-labs', 'cerebras', 'cohere', 'fal-ai', 'fireworks-ai', 'inference-providers', 'hyperbolic', 'nebius', 'novita', 'openai', 'replicate', 'sambanova', 'together']] = None"}, {"name": "backend_options", "val": ": dict | None = None"}]</parameters><paramsdesc>- **model** (str) -- The name of the model. | |
| - **templates** (Callable) -- A function taking into account the question, options, answer, and gold and returning the judge prompt. | |
| - **process_judge_response** (Callable) -- A function for processing the judge's response. | |
| - **judge_backend** (Literal["litellm", "openai", "transformers", "tgi", "vllm", "inference-providers"]) -- The backend for the judge. | |
| - **url** (str | None) -- The URL for the OpenAI API. | |
| - **api_key** (str | None) -- The API key for the OpenAI API (either OpenAI or HF key). | |
| - **max_tokens** (int) -- The maximum number of tokens to generate. Defaults to 512. | |
| - **response_format** (BaseModel | None) -- The format of the response from the API, used for the OpenAI and TGI backend. | |
| - **hf_provider** (Literal["black-forest-labs", "cerebras", "cohere", "fal-ai", "fireworks-ai", -- | |
| "inference-providers", "hyperbolic", "nebius", "novita", "openai", "replicate", "sambanova", "together"] | None): | |
| The HuggingFace provider when using the inference-providers backend. | |
| - **backend_options** (dict | None) -- Options for the backend. Currently only supported for litellm.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| A class representing a judge for evaluating answers using either the chosen backend. | |
| Methods: | |
| evaluate_answer: Evaluates an answer using the OpenAI API or Transformers library. | |
| __lazy_load_client: Lazy loads the OpenAI client or Transformers pipeline. | |
| __call_api: Calls the API to get the judge's response. | |
| __call_transformers: Calls the Transformers pipeline to get the judge's response. | |
| __call_vllm: Calls the VLLM pipeline to get the judge's response. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>dict_of_lists_to_list_of_dicts</name><anchor>lighteval.metrics.utils.llm_as_judge.JudgeLM.dict_of_lists_to_list_of_dicts</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/llm_as_judge.py#L204</source><parameters>[{"name": "dict_of_lists", "val": ""}]</parameters><paramsdesc>- **dict_of_lists** -- A dictionary where each value is a list. | |
| All lists are expected to have the same length.</paramsdesc><paramgroups>0</paramgroups><retdesc>A list of dictionaries.</retdesc></docstring> | |
| Transform a dictionary of lists into a list of dictionaries. | |
| Each dictionary in the output list will contain one element from each list in the input dictionary, | |
| with the same keys as the input dictionary. | |
| Example: | |
| >>> dict_of_lists_to_list_of_dicts({'k': [1, 2, 3], 'k2': ['a', 'b', 'c']}) | |
| [{'k': 1, 'k2': 'a'}, {'k': 2, 'k2': 'b'}, {'k': 3, 'k2': 'c'}] | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>evaluate_answer</name><anchor>lighteval.metrics.utils.llm_as_judge.JudgeLM.evaluate_answer</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/metrics/utils/llm_as_judge.py#L272</source><parameters>[{"name": "question", "val": ": str"}, {"name": "answer", "val": ": str"}, {"name": "options", "val": ": list[str] | None = None"}, {"name": "gold", "val": ": str | None = None"}]</parameters><paramsdesc>- **question** (str) -- The prompt asked to the evaluated model. | |
| - **answer** (str) -- Answer given by the evaluated model. | |
| - **options** (list[str] | None) -- Optional list of answer options. | |
| - **gold** (str | None) -- Optional reference answer.</paramsdesc><paramgroups>0</paramgroups><retdesc>A tuple containing the score, prompts, and judgment.</retdesc></docstring> | |
| Evaluates an answer using either Transformers or OpenAI API. | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/package_reference/metrics.mdx" /> |
Xet Storage Details
- Size:
- 47.4 kB
- Xet hash:
- 7ea265ad7d26f4b5bb18516f2d33008c3bc2472f9b4f6892d505c68331c4051f
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.