Buckets:

rtrm's picture
|
download
raw
2.85 kB

Lighteval

๐Ÿค— Lighteval is your all-in-one toolkit for evaluating Large Language Models (LLMs) across multiple backends with ease. Dive deep into your model's performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack up.

Key Features

๐Ÿš€ Multi-Backend Support

Evaluate your models using the most popular and efficient inference backends:

๐Ÿ“Š Comprehensive Evaluation

  • Extensive Task Library: 1000s pre-built evaluation tasks
  • Custom Task Creation: Build your own evaluation tasks
  • Flexible Metrics: Support for custom metrics and scoring
  • Detailed Analysis: Sample-by-sample results for deep insights

๐Ÿ”ง Easy Customization

Customization at your fingertips: create new tasks, metrics or model tailored to your needs, or browse all our existing tasks and metrics.

โ˜๏ธ Seamless Integration

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

Quick Start

Installation

pip install lighteval

Basic Usage

# Evaluate a model using Transformers backend
lighteval accelerate \
    "model_name=openai-community/gpt2" \
    "leaderboard|truthfulqa:mc|0"

Save Results

# Save locally
lighteval accelerate \
    "model_name=openai-community/gpt2" \
    "leaderboard|truthfulqa:mc|0" \
    --output-dir ./results

# Push to Hugging Face Hub
lighteval accelerate \
    "model_name=openai-community/gpt2" \
    "leaderboard|truthfulqa:mc|0" \
    --push-to-hub \
    --results-org your-username

Xet Storage Details

Size:
2.85 kB
ยท
Xet hash:
da35927464f73b2c0abed49ba5d1223ed580750eb0061422ed9ad7518571f100

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.