Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_1027 /en /index.md

rtrm

about 2 months ago

preview code

download

raw

2.61 kB

Lighteval

🤗 Lighteval is your all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends with ease. Dive deep into your model's performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack up.

Key Features

🚀 Multi-Backend Support

Evaluate your models using the most popular and efficient inference backends:

eval: Use inspect-ai as backend to evaluate and inspect your models! (prefered way)
transformers: Evaluate models on CPU or one or more GPUs using 🤗 Accelerate
nanotron: Evaluate models in distributed settings using ⚡️ Nanotron
vllm: Evaluate models on one or more GPUs using 🚀 VLLM
custom: Evaluate custom models (can be anything)
sglang: Evaluate models using SGLang as backend
inference-endpoint: Evaluate models using Hugging Face's Inference Endpoints API
tgi: Evaluate models using 🔗 Text Generation Inference running locally
litellm: Evaluate models on any compatible API using LiteLLM
inference-providers: Evaluate models using HuggingFace's inference providers as backend**: Distributed training and evaluation

📊 Comprehensive Evaluation

Extensive Task Library: 1000s pre-built evaluation tasks
Custom Task Creation: Build your own evaluation tasks
Flexible Metrics: Support for custom metrics and scoring
Detailed Analysis: Sample-by-sample results for deep insights

🔧 Easy Customization

Customization at your fingertips: create new tasks, metrics or model tailored to your needs, or browse all our existing tasks and metrics.

☁️ Seamless Integration

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

Quick Start

Installation

pip install lighteval

Basic Usage

Find a task

Run your benchmark and push details to the hub

lighteval eval "hf-inference-providers/openai/gpt-oss-20b" \
    gpqa:diamond \
    --bundle-dir gpt-oss-bundle \
    --repo-id OpenEvals/evals

Resulting Space:

Xet Storage Details

Size:: 2.61 kB
Xet hash:: 6a941eef822159b0c8fdefe53b31aa29c2837b74dddc3aeca5e76673c15571ae

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.