Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_1221 /en /quicktour.md

HuggingFaceDocBuilder

13 days ago

preview code

download

raw

5.59 kB

	# Quick Tour

	> [!TIP]
	> We recommend using the `--help` flag to get more information about the
	> available options for each command.
	> `lighteval --help`

	Lighteval can be used with several different commands, each optimized for different evaluation scenarios.

	## Find your benchmark

	<iframe
	src="https://openevals-open-benchmark-index.hf.space"
	frameborder="0"
	width="850"
	height="450"
	>

	## Available Commands

	### Evaluation Backends

	- `lighteval eval`: Use [inspect-ai](https://inspect.aisi.org.uk/) as backend to evaluate and inspect your models ! (prefered way)
	- `lighteval accelerate`: Evaluate models on CPU or one or more GPUs using [🤗
	Accelerate](https://github.com/huggingface/accelerate)
	- `lighteval nanotron`: Evaluate models in distributed settings using [⚡️
	Nanotron](https://github.com/huggingface/nanotron)
	- `lighteval vllm`: Evaluate models on one or more GPUs using [🚀
	VLLM](https://github.com/vllm-project/vllm)
	- `lighteval custom`: Evaluate custom models (can be anything)
	- `lighteval sglang`: Evaluate models using [SGLang](https://github.com/sgl-project/sglang) as backend
	- `lighteval endpoint`: Evaluate models using various endpoints as backend
	- `lighteval endpoint inference-endpoint`: Evaluate models using Hugging Face's [Inference Endpoints API](https://huggingface.co/inference-endpoints/dedicated)
	- `lighteval endpoint tgi`: Evaluate models using [🔗 Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index) running locally
	- `lighteval endpoint litellm`: Evaluate models on any compatible API using [LiteLLM](https://www.litellm.ai/)
	- `lighteval endpoint inference-providers`: Evaluate models using [HuggingFace's inference providers](https://huggingface.co/docs/inference-providers/en/index) as backend

	### Evaluation Utils

	- `lighteval baseline`: Compute baselines for given tasks

	### Utils

	- `lighteval tasks`: List or inspect tasks
	- `lighteval tasks list`: List all available tasks
	- `lighteval tasks inspect`: Inspect a specific task to see its configuration and samples
	- `lighteval tasks create`: Create a new task from a template

	## Basic Usage

	To evaluate `GPT-2` on the Truthful QA benchmark with [🤗
	Accelerate](https://github.com/huggingface/accelerate), run:

	```bash
	lighteval accelerate \
	"model_name=openai-community/gpt2" \
	truthfulqa:mc
	```

	### Task Specification

	Tasks have a function applied at the sample level and one at the corpus level. For example,
	- an exact match can be applied per sample, then averaged over the corpus to give the final score
	- samples can be left untouched before applying Corpus BLEU at the corpus level
	etc.

	If the task you are looking at has a sample level function (`sample_level_fn`) which can be parametrized, you can pass parameters in the CLI.
	For example
	```txt
	{task}@{parameter_name1}={value1}@{parameter_name2}={value2},...\|0
	```

	All officially supported tasks can be found at the [tasks_list](available-tasks) and in the
	[extended folder](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks/extended).
	Moreover, community-provided tasks can be found in the
	[community](https://github.com/huggingface/lighteval/tree/main/community_tasks) folder.

	For more details on the implementation of the tasks, such as how prompts are constructed or which metrics are used, you can examine the
	[implementation file](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/default_tasks.py).

	### Running Multiple Tasks

	Running multiple tasks is supported, either with a comma-separated list or by specifying a file path.
	The file should be structured like [examples/tasks/recommended_set.txt](https://github.com/huggingface/lighteval/blob/main/examples/tasks/recommended_set.txt).
	When specifying a path to a file, it should start with `./`.

	```bash
	lighteval accelerate \
	"model_name=openai-community/gpt2" \
	./path/to/lighteval/examples/tasks/recommended_set.txt
	# or, e.g., "truthfulqa:mc\|0,gsm8k\|3"
	```

	## Backend Configuration

	### General Information

	The `model-args` argument takes a string representing a list of model
	arguments. The arguments allowed vary depending on the backend you use and
	correspond to the fields of the model configurations.

	The model configurations can be found [here](./package_reference/models).

	All models allow you to post-process your reasoning model predictions
	to remove the thinking tokens from the trace used to compute the metrics,
	using `--remove-reasoning-tags` and `--reasoning-tags` to specify which
	reasoning tags to remove (defaults to `<think>` and `</think>`).

	Here's an example with `mistralai/Magistral-Small-2507` which outputs custom
	thinking tokens:

	```bash
	lighteval vllm \
	"model_name=mistralai/Magistral-Small-2507,dtype=float16,data_parallel_size=4" \
	aime24 \
	--remove-reasoning-tags \
	--reasoning-tags="[('[THINK]','[/THINK]')]"
	```

	### Nanotron

	To evaluate a model trained with Nanotron on a single GPU:

	> [!WARNING]
	> Nanotron models cannot be evaluated without torchrun.

	```bash
	torchrun --standalone --nnodes=1 --nproc-per-node=1 \
	src/lighteval/__main__.py nanotron \
	--checkpoint-config-path ../nanotron/checkpoints/10/config.yaml \
	--lighteval-config-path examples/nanotron/lighteval_config_override_template.yaml
	```

	The `nproc-per-node` argument should match the data, tensor, and pipeline
	parallelism configured in the `lighteval_config_template.yaml` file.
	That is: `nproc-per-node = data_parallelism * tensor_parallelism *
	pipeline_parallelism`.

Xet Storage Details

Size:: 5.59 kB
Xet hash:: 85dba28c276863c832bd5fefc1b9c8eb0f9b70d497b7aded40ab883130c9d994

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.