Buckets:

|
download
raw
3 kB
# Evaluate your model with Inspect-AI
Pick the right benchmarks with our benchmark finder:
Search by language, task type, dataset name, or keywords.
> [!WARNING]
> Not all tasks are compatible with inspect-ai's API as of yet, we are working on converting all of them !
<iframe
src="https://openevals-open-benchmark-index.hf.space"
frameborder="0"
width="850"
height="450"
>
Once you've chosen a benchmark, run it with `lighteval eval`. Below are examples for common setups.
### Examples
1. Evaluate a model via Hugging Face Inference Providers.
```bash
lighteval eval "hf-inference-providers/openai/gpt-oss-20b" gpqa:diamond
```
2. Run multiple evals at the same time.
```bash
lighteval eval "hf-inference-providers/openai/gpt-oss-20b" gpqa:diamond,aime25
```
3. Compare providers for the same model.
```bash
lighteval eval \
hf-inference-providers/openai/gpt-oss-20b:fireworks-ai \
hf-inference-providers/openai/gpt-oss-20b:together \
hf-inference-providers/openai/gpt-oss-20b:nebius \
gpqa:diamond
```
You can also compare every providers serving one model in one line:
```bash
hf-inference-providers/openai/gpt-oss-20b:all \
"lighteval|gpqa:diamond|0"
```
4. Evaluate a vLLM or SGLang model.
```bash
lighteval eval vllm/HuggingFaceTB/SmolLM-135M-Instruct gpqa:diamond
```
5. See the impact of few-shot on your model.
```bash
lighteval eval hf-inference-providers/openai/gpt-oss-20b "gsm8k|0,gsm8k|5"
```
6. Optimize custom server connections.
```bash
lighteval eval hf-inference-providers/openai/gpt-oss-20b gsm8k \
--max-connections 50 \
--timeout 30 \
--retry-on-error 1 \
--max-retries 1 \
--max-samples 10
```
7. Use multiple epochs for more reliable results.
```bash
lighteval eval hf-inference-providers/openai/gpt-oss-20b aime25 --epochs 16 --epochs-reducer "pass_at_4"
```
8. Push to the Hub to share results.
```bash
lighteval eval hf-inference-providers/openai/gpt-oss-20b hle \
--bundle-dir gpt-oss-bundle \
--repo-id OpenEvals/evals \
--max-samples 100
```
Resulting Space:
<iframe
src="https://openevals-evals.static.hf.space"
frameborder="0"
width="850"
height="450"
>
9. Change model behaviour
You can use any argument defined in inspect-ai's API.
```bash
lighteval eval hf-inference-providers/openai/gpt-oss-20b aime25 --temperature 0.1
```
10. Use model-args to use any inference provider specific argument.
```bash
lighteval eval google/gemini-2.5-pro aime25 --model-args location=us-east5
```
```bash
lighteval eval openai/gpt-4o gpqa:diamond --model-args service_tier=flex,client_timeout=1200
```
LightEval prints a per-model results table:
```
Completed all tasks in 'lighteval-logs' successfully
| Model |gpqa|gpqa:diamond|
|---------------------------------------|---:|-----------:|
|vllm/HuggingFaceTB/SmolLM-135M-Instruct|0.01| 0.01|
results saved to lighteval-logs
run "inspect view --log-dir lighteval-logs" to view the results
```

Xet Storage Details

Size:
3 kB
·
Xet hash:
5cb368083c6dd41c2b296a53053b3f94ac81c3b4917de0b5d897c670e1d84131

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.