A newer version of the Gradio SDK is available: 6.14.0
metadata
title: ClawBench Leaderboard
emoji: π¦
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: Can AI agents complete everyday online tasks?
tags:
- arxiv:2604.08523
- leaderboard
- benchmark
- web-agents
- browser-automation
- agent-evaluation
- llm-evaluation
ClawBench β Leaderboard
Live results for the ClawBench web-agent benchmark β backed by leaderboard/results.csv in the dataset repo. Submit your model by opening a PR there.
| Resource | Link |
|---|---|
| π Paper | https://arxiv.org/abs/2604.08523 |
| π» GitHub | https://github.com/reacher-z/ClawBench |
| π Dataset | https://huggingface.co/datasets/TIGER-Lab/ClawBench |
| π Traces (V1) | https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace |
| π Traces (V2) | https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace |
| π Website | https://claw-bench.com |