Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_994 /en /package_reference /models_outputs.md

rtrm

about 1 month ago

preview code

download

raw

4.98 kB

	# Model's Output[[lighteval.models.model_output.ModelResponse]]

	All models will generate an ouput per Doc supplied to the `generation` or `loglikelihood` fuctions.

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class lighteval.models.model_output.ModelResponse</name><anchor>lighteval.models.model_output.ModelResponse</anchor><source>https://github.com/huggingface/lighteval/blob/vr_994/src/lighteval/models/model_output.py#L29</source><parameters>[{"name": "input", "val": ": str \| list \| None = None"}, {"name": "input_tokens", "val": ": list = <factory>"}, {"name": "text", "val": ": list = <factory>"}, {"name": "output_tokens", "val": ": list = <factory>"}, {"name": "text_post_processed", "val": ": list[str] \| None = None"}, {"name": "reasonings", "val": ": list = <factory>"}, {"name": "logprobs", "val": ": list = <factory>"}, {"name": "argmax_logits_eq_gold", "val": ": list = <factory>"}, {"name": "logits", "val": ": list[list[float]] \| None = None"}, {"name": "unconditioned_logprobs", "val": ": list[float] \| None = None"}, {"name": "truncated_tokens_count", "val": ": int = 0"}, {"name": "padded_tokens_count", "val": ": int = 0"}]</parameters><paramsdesc>- input (str \| list \| None) --
	The original input prompt or context that was fed to the model.
	Used for debugging and analysis purposes.

	- input_tokens (list[int]) --
	The tokenized representation of the input prompt.
	Useful for understanding how the model processes the input.

	- text (list[str]) --
	The generated text responses from the model. Each element represents
	one generation (useful when num_samples > 1).
	Required for: Generative metrics, exact match, llm as a judge, etc.

	- text_post_processed (Optional[list[str]]) --
	The generated text responses from the model, but post processed.
	Atm, post processing removes thinking/reasoning steps.

	Careful! This is not computed by default, but in a separate step by calling
	`post_process` on the ModelResponse object.
	Required for: Generative metrics that require direct answers.

	- logprobs (list[float]) --
	Log probabilities of the generated tokens or sequences.
	Required for: loglikelihood and perplexity metrics.

	- argmax_logits_eq_gold (list[bool]) --
	Whether the argmax logits match the gold/expected text.
	Used for accuracy calculations in multiple choice and classification tasks.
	Required for: certain loglikelihood metrics.


	- unconditioned_logprobs (Optional[list[float]]) --
	Log probabilities from an unconditioned model (e.g., without context).
	Used for PMI (Pointwise Mutual Information) normalization.
	Required for: PMI metrics.</paramsdesc><paramgroups>0</paramgroups></docstring>
	A class to represent the response from a model during evaluation.

	This dataclass contains all the information returned by a model during inference,
	including generated text, log probabilities, token information, and metadata.
	Different attributes are required for different types of evaluation metrics.



	Usage Examples:

	For generative tasks (text completion, summarization):
	<ExampleCodeBlock anchor="lighteval.models.model_output.ModelResponse.example">

	```python
	response = ModelResponse(
	text=["The capital of France is Paris."],
	input_tokens=[1, 2, 3, 4],
	output_tokens=[[5, 6, 7, 8]]
	)
	```

	</ExampleCodeBlock>

	For multiple choice tasks:
	<ExampleCodeBlock anchor="lighteval.models.model_output.ModelResponse.example-2">

	```python
	response = ModelResponse(
	logprobs=[-0.5, -1.2, -2.1, -1.8], # Logprobs for each choice
	argmax_logits_eq_gold=[False, False, False, False], # Whether correct choice was selected
	input_tokens=[1, 2, 3, 4],
	output_tokens=[[5], [6], [7], [8]]
	)
	```

	</ExampleCodeBlock>

	For perplexity calculation:
	<ExampleCodeBlock anchor="lighteval.models.model_output.ModelResponse.example-3">

	```python
	response = ModelResponse(
	text=["The model generated this text."],
	logprobs=[-1.2, -0.8, -1.5, -0.9, -1.1], # Logprobs for each token
	input_tokens=[1, 2, 3, 4, 5],
	output_tokens=[[6], [7], [8], [9], [10]]
	)
	```

	</ExampleCodeBlock>

	For PMI analysis:
	<ExampleCodeBlock anchor="lighteval.models.model_output.ModelResponse.example-4">

	```python
	response = ModelResponse(
	text=["The answer is 42."],
	logprobs=[-1.1, -0.9, -1.3, -0.7], # Conditioned logprobs
	unconditioned_logprobs=[-2.1, -1.8, -2.3, -1.5], # Unconditioned logprobs
	input_tokens=[1, 2, 3, 4],
	output_tokens=[[5], [6], [7], [8]]
	)
	```

	</ExampleCodeBlock>

	Notes:
	- For most evaluation tasks, only a subset of attributes is required
	- The `text` attribute is the most commonly used for generative tasks
	- `logprobs` are essential for probability-based metrics like perplexity


	</div>

	<EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/package_reference/models_outputs.mdx" />

Xet Storage Details

Size:: 4.98 kB
Xet hash:: dec06bd55000f89156df534337a158a50dbff483ed7ea5ee5c278ab1b35675a8

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.