| title: ClawBench Leaderboard | |
| emoji: π¦ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.15.0 | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| short_description: Can AI agents complete everyday online tasks? | |
| tags: | |
| - arxiv:2604.08523 | |
| - leaderboard | |
| - benchmark | |
| - web-agents | |
| - browser-automation | |
| - agent-evaluation | |
| - llm-evaluation | |
| # ClawBench β Leaderboard | |
| Live results for the [ClawBench](https://huggingface.co/datasets/TIGER-Lab/ClawBench) web-agent benchmark β backed by [`leaderboard/results.csv`](https://huggingface.co/datasets/TIGER-Lab/ClawBench/blob/main/leaderboard/results.csv) in the dataset repo. Submit your model by opening a PR there. | |
| | Resource | Link | | |
| |---|---| | |
| | π Paper | https://arxiv.org/abs/2604.08523 | | |
| | π» GitHub | https://github.com/reacher-z/ClawBench | | |
| | π Dataset | https://huggingface.co/datasets/TIGER-Lab/ClawBench | | |
| | π Traces (V1) | https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace | | |
| | π Traces (V2) | https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace | | |
| | π Website | https://claw-bench.com | | |