File size: 1,108 Bytes
dbb3bdb 41e181d dbb3bdb 41e181d dbb3bdb 41e181d 50a75ee 41e181d dbb3bdb 41e181d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | ---
title: ClawBench Leaderboard
emoji: π¦
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: Can AI agents complete everyday online tasks?
tags:
- arxiv:2604.08523
- leaderboard
- benchmark
- web-agents
- browser-automation
- agent-evaluation
- llm-evaluation
---
# ClawBench β Leaderboard
Live results for the [ClawBench](https://huggingface.co/datasets/TIGER-Lab/ClawBench) web-agent benchmark β backed by [`leaderboard/results.csv`](https://huggingface.co/datasets/TIGER-Lab/ClawBench/blob/main/leaderboard/results.csv) in the dataset repo. Submit your model by opening a PR there.
| Resource | Link |
|---|---|
| π Paper | https://arxiv.org/abs/2604.08523 |
| π» GitHub | https://github.com/reacher-z/ClawBench |
| π Dataset | https://huggingface.co/datasets/TIGER-Lab/ClawBench |
| π Traces (V1) | https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace |
| π Traces (V2) | https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace |
| π Website | https://claw-bench.com |
|