Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_994 /en /index.md

rtrm

about 2 months ago

preview code

download

raw

2.85 kB

	# Lighteval

	🤗 Lighteval is your all-in-one toolkit for evaluating Large Language Models
	(LLMs) across multiple backends with ease. Dive deep into your model's
	performance by saving and exploring detailed, sample-by-sample results to debug
	and see how your models stack up.

	## Key Features

	### 🚀 Multi-Backend Support
	Evaluate your models using the most popular and efficient inference backends:
	- `transformers`: Evaluate models on CPU or one or more GPUs using [🤗
	Accelerate](https://github.com/huggingface/transformers)
	- `nanotron`: Evaluate models in distributed settings using [⚡️
	Nanotron](https://github.com/huggingface/nanotron)
	- `vllm`: Evaluate models on one or more GPUs using [🚀
	VLLM](https://github.com/vllm-project/vllm)
	- `custom`: Evaluate custom models (can be anything)
	- `sglang`: Evaluate models using [SGLang](https://github.com/sgl-project/sglang) as backend
	- `inference-endpoint`: Evaluate models using Hugging Face's [Inference Endpoints API](https://huggingface.co/inference-endpoints/dedicated)
	- `tgi`: Evaluate models using [🔗 Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index) running locally
	- `litellm`: Evaluate models on any compatible API using [LiteLLM](https://www.litellm.ai/)
	- `inference-providers`: Evaluate models using [HuggingFace's inference providers](https://huggingface.co/docs/inference-providers/en/index) as backend**: Distributed training and evaluation

	### 📊 Comprehensive Evaluation
	- Extensive Task Library: 1000s pre-built evaluation tasks
	- Custom Task Creation: Build your own evaluation tasks
	- Flexible Metrics: Support for custom metrics and scoring
	- Detailed Analysis: Sample-by-sample results for deep insights

	### 🔧 Easy Customization
	Customization at your fingertips: create [new tasks](adding-a-custom-task),
	[metrics](adding-a-new-metric) or [model](evaluating-a-custom-model) tailored to your needs, or browse all our existing tasks and metrics.

	### ☁️ Seamless Integration
	Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

	## Quick Start

	### Installation

	```bash
	pip install lighteval
	```

	### Basic Usage

	```bash
	# Evaluate a model using Transformers backend
	lighteval accelerate \
	"model_name=openai-community/gpt2" \
	"leaderboard\|truthfulqa:mc\|0"
	```

	### Save Results

	```bash
	# Save locally
	lighteval accelerate \
	"model_name=openai-community/gpt2" \
	"leaderboard\|truthfulqa:mc\|0" \
	--output-dir ./results

	# Push to Hugging Face Hub
	lighteval accelerate \
	"model_name=openai-community/gpt2" \
	"leaderboard\|truthfulqa:mc\|0" \
	--push-to-hub \
	--results-org your-username
	```


	<EditOnGithub source="https://github.com/huggingface/lighteval/blob/main/docs/source/index.mdx" />

Xet Storage Details

Size:: 2.85 kB
Xet hash:: da35927464f73b2c0abed49ba5d1223ed580750eb0061422ed9ad7518571f100

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.