Buckets:
| # Lighteval | |
| ๐ค Lighteval is your all-in-one toolkit for evaluating Large Language Models | |
| (LLMs) across multiple backends with ease. Dive deep into your model's | |
| performance by saving and exploring detailed, sample-by-sample results to debug | |
| and see how your models stack up. | |
| ## Key Features | |
| ### ๐ **Multi-Backend Support** | |
| Evaluate your models using the most popular and efficient inference backends: | |
| - `transformers`: Evaluate models on CPU or one or more GPUs using [๐ค | |
| Accelerate](https://github.com/huggingface/transformers) | |
| - `nanotron`: Evaluate models in distributed settings using [โก๏ธ | |
| Nanotron](https://github.com/huggingface/nanotron) | |
| - `vllm`: Evaluate models on one or more GPUs using [๐ | |
| VLLM](https://github.com/vllm-project/vllm) | |
| - `custom`: Evaluate custom models (can be anything) | |
| - `sglang`: Evaluate models using [SGLang](https://github.com/sgl-project/sglang) as backend | |
| - `inference-endpoint`: Evaluate models using Hugging Face's [Inference Endpoints API](https://huggingface.co/inference-endpoints/dedicated) | |
| - `tgi`: Evaluate models using [๐ Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index) running locally | |
| - `litellm`: Evaluate models on any compatible API using [LiteLLM](https://www.litellm.ai/) | |
| - `inference-providers`: Evaluate models using [HuggingFace's inference providers](https://huggingface.co/docs/inference-providers/en/index) as backend**: Distributed training and evaluation | |
| ### ๐ **Comprehensive Evaluation** | |
| - **Extensive Task Library**: 1000s pre-built evaluation tasks | |
| - **Custom Task Creation**: Build your own evaluation tasks | |
| - **Flexible Metrics**: Support for custom metrics and scoring | |
| - **Detailed Analysis**: Sample-by-sample results for deep insights | |
| ### ๐ง **Easy Customization** | |
| Customization at your fingertips: create [new tasks](adding-a-custom-task), | |
| [metrics](adding-a-new-metric) or [model](evaluating-a-custom-model) tailored to your needs, or browse all our existing tasks and metrics. | |
| ### โ๏ธ **Seamless Integration** | |
| Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally. | |
| ## Quick Start | |
| ### Installation | |
| ```bash | |
| pip install lighteval | |
| ``` | |
| ### Basic Usage | |
| ```bash | |
| # Evaluate a model using Transformers backend | |
| lighteval accelerate \ | |
| "model_name=openai-community/gpt2" \ | |
| "leaderboard|truthfulqa:mc|0" | |
| ``` | |
| ### Save Results | |
| ```bash | |
| # Save locally | |
| lighteval accelerate \ | |
| "model_name=openai-community/gpt2" \ | |
| "leaderboard|truthfulqa:mc|0" \ | |
| --output-dir ./results | |
| # Push to Hugging Face Hub | |
| lighteval accelerate \ | |
| "model_name=openai-community/gpt2" \ | |
| "leaderboard|truthfulqa:mc|0" \ | |
| --push-to-hub \ | |
| --results-org your-username | |
| ``` | |
| <EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/index.mdx" /> |
Xet Storage Details
- Size:
- 2.85 kB
- Xet hash:
- da35927464f73b2c0abed49ba5d1223ed580750eb0061422ed9ad7518571f100
ยท
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.