Buckets:

rtrm's picture
|
download
raw
2.85 kB
# Lighteval
๐Ÿค— Lighteval is your all-in-one toolkit for evaluating Large Language Models
(LLMs) across multiple backends with ease. Dive deep into your model's
performance by saving and exploring detailed, sample-by-sample results to debug
and see how your models stack up.
## Key Features
### ๐Ÿš€ **Multi-Backend Support**
Evaluate your models using the most popular and efficient inference backends:
- `transformers`: Evaluate models on CPU or one or more GPUs using [๐Ÿค—
Accelerate](https://github.com/huggingface/transformers)
- `nanotron`: Evaluate models in distributed settings using [โšก๏ธ
Nanotron](https://github.com/huggingface/nanotron)
- `vllm`: Evaluate models on one or more GPUs using [๐Ÿš€
VLLM](https://github.com/vllm-project/vllm)
- `custom`: Evaluate custom models (can be anything)
- `sglang`: Evaluate models using [SGLang](https://github.com/sgl-project/sglang) as backend
- `inference-endpoint`: Evaluate models using Hugging Face's [Inference Endpoints API](https://huggingface.co/inference-endpoints/dedicated)
- `tgi`: Evaluate models using [๐Ÿ”— Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index) running locally
- `litellm`: Evaluate models on any compatible API using [LiteLLM](https://www.litellm.ai/)
- `inference-providers`: Evaluate models using [HuggingFace's inference providers](https://huggingface.co/docs/inference-providers/en/index) as backend**: Distributed training and evaluation
### ๐Ÿ“Š **Comprehensive Evaluation**
- **Extensive Task Library**: 1000s pre-built evaluation tasks
- **Custom Task Creation**: Build your own evaluation tasks
- **Flexible Metrics**: Support for custom metrics and scoring
- **Detailed Analysis**: Sample-by-sample results for deep insights
### ๐Ÿ”ง **Easy Customization**
Customization at your fingertips: create [new tasks](adding-a-custom-task),
[metrics](adding-a-new-metric) or [model](evaluating-a-custom-model) tailored to your needs, or browse all our existing tasks and metrics.
### โ˜๏ธ **Seamless Integration**
Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.
## Quick Start
### Installation
```bash
pip install lighteval
```
### Basic Usage
```bash
# Evaluate a model using Transformers backend
lighteval accelerate \
"model_name=openai-community/gpt2" \
"leaderboard|truthfulqa:mc|0"
```
### Save Results
```bash
# Save locally
lighteval accelerate \
"model_name=openai-community/gpt2" \
"leaderboard|truthfulqa:mc|0" \
--output-dir ./results
# Push to Hugging Face Hub
lighteval accelerate \
"model_name=openai-community/gpt2" \
"leaderboard|truthfulqa:mc|0" \
--push-to-hub \
--results-org your-username
```
<EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/index.mdx" />

Xet Storage Details

Size:
2.85 kB
ยท
Xet hash:
da35927464f73b2c0abed49ba5d1223ed580750eb0061422ed9ad7518571f100

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.