Buckets:
| # Quick Tour | |
| > [!TIP] | |
| > We recommend using the `--help` flag to get more information about the | |
| > available options for each command. | |
| > `lighteval --help` | |
| Lighteval can be used with several different commands, each optimized for different evaluation scenarios. | |
| ## Find your benchmark | |
| <iframe | |
| src="https://openevals-open-benchmark-index.hf.space" | |
| frameborder="0" | |
| width="850" | |
| height="450" | |
| > | |
| ## Available Commands | |
| ### Evaluation Backends | |
| - `lighteval eval`: Use [inspect-ai](https://inspect.aisi.org.uk/) as backend to evaluate and inspect your models ! (prefered way) | |
| - `lighteval accelerate`: Evaluate models on CPU or one or more GPUs using [๐ค | |
| Accelerate](https://github.com/huggingface/accelerate) | |
| - `lighteval nanotron`: Evaluate models in distributed settings using [โก๏ธ | |
| Nanotron](https://github.com/huggingface/nanotron) | |
| - `lighteval vllm`: Evaluate models on one or more GPUs using [๐ | |
| VLLM](https://github.com/vllm-project/vllm) | |
| - `lighteval custom`: Evaluate custom models (can be anything) | |
| - `lighteval sglang`: Evaluate models using [SGLang](https://github.com/sgl-project/sglang) as backend | |
| - `lighteval endpoint`: Evaluate models using various endpoints as backend | |
| - `lighteval endpoint inference-endpoint`: Evaluate models using Hugging Face's [Inference Endpoints API](https://huggingface.co/inference-endpoints/dedicated) | |
| - `lighteval endpoint tgi`: Evaluate models using [๐ Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index) running locally | |
| - `lighteval endpoint litellm`: Evaluate models on any compatible API using [LiteLLM](https://www.litellm.ai/) | |
| - `lighteval endpoint inference-providers`: Evaluate models using [HuggingFace's inference providers](https://huggingface.co/docs/inference-providers/en/index) as backend | |
| ### Evaluation Utils | |
| - `lighteval baseline`: Compute baselines for given tasks | |
| ### Utils | |
| - `lighteval tasks`: List or inspect tasks | |
| - `lighteval tasks list`: List all available tasks | |
| - `lighteval tasks inspect`: Inspect a specific task to see its configuration and samples | |
| - `lighteval tasks create`: Create a new task from a template | |
| ## Basic Usage | |
| To evaluate `GPT-2` on the Truthful QA benchmark with [๐ค | |
| Accelerate](https://github.com/huggingface/accelerate), run: | |
| ```bash | |
| lighteval accelerate \ | |
| "model_name=openai-community/gpt2" \ | |
| truthfulqa:mc | |
| ``` | |
| ### Task Specification | |
| Tasks have a function applied at the sample level and one at the corpus level. For example, | |
| - an exact match can be applied per sample, then averaged over the corpus to give the final score | |
| - samples can be left untouched before applying Corpus BLEU at the corpus level | |
| etc. | |
| If the task you are looking at has a sample level function (`sample_level_fn`) which can be parametrized, you can pass parameters in the CLI. | |
| For example | |
| ```txt | |
| {task}@{parameter_name1}={value1}@{parameter_name2}={value2},...|0 | |
| ``` | |
| All officially supported tasks can be found at the [tasks_list](available-tasks) and in the | |
| [extended folder](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks/extended). | |
| Moreover, community-provided tasks can be found in the | |
| [community](https://github.com/huggingface/lighteval/tree/main/community_tasks) folder. | |
| For more details on the implementation of the tasks, such as how prompts are constructed or which metrics are used, you can examine the | |
| [implementation file](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/default_tasks.py). | |
| ### Running Multiple Tasks | |
| Running multiple tasks is supported, either with a comma-separated list or by specifying a file path. | |
| The file should be structured like [examples/tasks/recommended_set.txt](https://github.com/huggingface/lighteval/blob/main/examples/tasks/recommended_set.txt). | |
| When specifying a path to a file, it should start with `./`. | |
| ```bash | |
| lighteval accelerate \ | |
| "model_name=openai-community/gpt2" \ | |
| ./path/to/lighteval/examples/tasks/recommended_set.txt | |
| # or, e.g., "truthfulqa:mc|0,gsm8k|3" | |
| ``` | |
| ## Backend Configuration | |
| ### General Information | |
| The `model-args` argument takes a string representing a list of model | |
| arguments. The arguments allowed vary depending on the backend you use and | |
| correspond to the fields of the model configurations. | |
| The model configurations can be found [here](./package_reference/models). | |
| All models allow you to post-process your reasoning model predictions | |
| to remove the thinking tokens from the trace used to compute the metrics, | |
| using `--remove-reasoning-tags` and `--reasoning-tags` to specify which | |
| reasoning tags to remove (defaults to `<think>` and `</think>`). | |
| Here's an example with `mistralai/Magistral-Small-2507` which outputs custom | |
| thinking tokens: | |
| ```bash | |
| lighteval vllm \ | |
| "model_name=mistralai/Magistral-Small-2507,dtype=float16,data_parallel_size=4" \ | |
| aime24 \ | |
| --remove-reasoning-tags \ | |
| --reasoning-tags="[('[THINK]','[/THINK]')]" | |
| ``` | |
| ### Nanotron | |
| To evaluate a model trained with Nanotron on a single GPU: | |
| > [!WARNING] | |
| > Nanotron models cannot be evaluated without torchrun. | |
| ```bash | |
| torchrun --standalone --nnodes=1 --nproc-per-node=1 \ | |
| src/lighteval/__main__.py nanotron \ | |
| --checkpoint-config-path ../nanotron/checkpoints/10/config.yaml \ | |
| --lighteval-config-path examples/nanotron/lighteval_config_override_template.yaml | |
| ``` | |
| The `nproc-per-node` argument should match the data, tensor, and pipeline | |
| parallelism configured in the `lighteval_config_template.yaml` file. | |
| That is: `nproc-per-node = data_parallelism * tensor_parallelism * | |
| pipeline_parallelism`. | |
Xet Storage Details
- Size:
- 5.59 kB
- Xet hash:
- 85dba28c276863c832bd5fefc1b9c8eb0f9b70d497b7aded40ab883130c9d994
ยท
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.