Spaces:

TIGER-Lab
/

ClawBench

Running

ClawBench / README.md

Add arxiv:2604.08523 tag for HF Papers auto-linking

50a75ee verified about 22 hours ago

1.11 kB

	---
	title: ClawBench Leaderboard
	emoji: 🦀
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 5.15.0
	app_file: app.py
	pinned: true
	license: apache-2.0
	short_description: Can AI agents complete everyday online tasks?
	tags:
	- arxiv:2604.08523
	- leaderboard
	- benchmark
	- web-agents
	- browser-automation
	- agent-evaluation
	- llm-evaluation
	---

	# ClawBench — Leaderboard

	Live results for the [ClawBench](https://huggingface.co/datasets/TIGER-Lab/ClawBench) web-agent benchmark — backed by [`leaderboard/results.csv`](https://huggingface.co/datasets/TIGER-Lab/ClawBench/blob/main/leaderboard/results.csv) in the dataset repo. Submit your model by opening a PR there.

	\| Resource \| Link \|
	\|---\|---\|
	\| 📖 Paper \| https://arxiv.org/abs/2604.08523 \|
	\| 💻 GitHub \| https://github.com/reacher-z/ClawBench \|
	\| 🗂 Dataset \| https://huggingface.co/datasets/TIGER-Lab/ClawBench \|
	\| 🎞 Traces (V1) \| https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace \|
	\| 🎞 Traces (V2) \| https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace \|
	\| 🌐 Website \| https://claw-bench.com \|