Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / lighteval /pr_1221 /en /inspect-ai.md

HuggingFaceDocBuilder

26 days ago

preview code

download

raw

3 kB

	# Evaluate your model with Inspect-AI

	Pick the right benchmarks with our benchmark finder:
	Search by language, task type, dataset name, or keywords.

	> [!WARNING]
	> Not all tasks are compatible with inspect-ai's API as of yet, we are working on converting all of them !

	<iframe
	src="https://openevals-open-benchmark-index.hf.space"
	frameborder="0"
	width="850"
	height="450"
	>

	Once you've chosen a benchmark, run it with `lighteval eval`. Below are examples for common setups.

	### Examples

	1. Evaluate a model via Hugging Face Inference Providers.

	```bash
	lighteval eval "hf-inference-providers/openai/gpt-oss-20b" gpqa:diamond
	```

	2. Run multiple evals at the same time.

	```bash
	lighteval eval "hf-inference-providers/openai/gpt-oss-20b" gpqa:diamond,aime25
	```

	3. Compare providers for the same model.

	```bash
	lighteval eval \
	hf-inference-providers/openai/gpt-oss-20b:fireworks-ai \
	hf-inference-providers/openai/gpt-oss-20b:together \
	hf-inference-providers/openai/gpt-oss-20b:nebius \
	gpqa:diamond
	```

	You can also compare every providers serving one model in one line:

	```bash
	hf-inference-providers/openai/gpt-oss-20b:all \
	"lighteval\|gpqa:diamond\|0"
	```

	4. Evaluate a vLLM or SGLang model.

	```bash
	lighteval eval vllm/HuggingFaceTB/SmolLM-135M-Instruct gpqa:diamond
	```

	5. See the impact of few-shot on your model.

	```bash
	lighteval eval hf-inference-providers/openai/gpt-oss-20b "gsm8k\|0,gsm8k\|5"
	```

	6. Optimize custom server connections.

	```bash
	lighteval eval hf-inference-providers/openai/gpt-oss-20b gsm8k \
	--max-connections 50 \
	--timeout 30 \
	--retry-on-error 1 \
	--max-retries 1 \
	--max-samples 10
	```

	7. Use multiple epochs for more reliable results.

	```bash
	lighteval eval hf-inference-providers/openai/gpt-oss-20b aime25 --epochs 16 --epochs-reducer "pass_at_4"
	```

	8. Push to the Hub to share results.

	```bash
	lighteval eval hf-inference-providers/openai/gpt-oss-20b hle \
	--bundle-dir gpt-oss-bundle \
	--repo-id OpenEvals/evals \
	--max-samples 100
	```

	Resulting Space:

	<iframe
	src="https://openevals-evals.static.hf.space"
	frameborder="0"
	width="850"
	height="450"
	>

	9. Change model behaviour

	You can use any argument defined in inspect-ai's API.

	```bash
	lighteval eval hf-inference-providers/openai/gpt-oss-20b aime25 --temperature 0.1
	```

	10. Use model-args to use any inference provider specific argument.

	```bash
	lighteval eval google/gemini-2.5-pro aime25 --model-args location=us-east5
	```

	```bash
	lighteval eval openai/gpt-4o gpqa:diamond --model-args service_tier=flex,client_timeout=1200
	```

	LightEval prints a per-model results table:

	```
	Completed all tasks in 'lighteval-logs' successfully

	\| Model \|gpqa\|gpqa:diamond\|
	\|---------------------------------------\|---:\|-----------:\|
	\|vllm/HuggingFaceTB/SmolLM-135M-Instruct\|0.01\| 0.01\|

	results saved to lighteval-logs
	run "inspect view --log-dir lighteval-logs" to view the results
	```

Xet Storage Details

Size:: 3 kB
Xet hash:: 5cb368083c6dd41c2b296a53053b3f94ac81c3b4917de0b5d897c670e1d84131

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.