ClawBench / README.md
AgPerry's picture
Add arxiv:2604.08523 tag for HF Papers auto-linking
50a75ee verified
---
title: ClawBench Leaderboard
emoji: πŸ¦€
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: Can AI agents complete everyday online tasks?
tags:
- arxiv:2604.08523
- leaderboard
- benchmark
- web-agents
- browser-automation
- agent-evaluation
- llm-evaluation
---
# ClawBench β€” Leaderboard
Live results for the [ClawBench](https://huggingface.co/datasets/TIGER-Lab/ClawBench) web-agent benchmark β€” backed by [`leaderboard/results.csv`](https://huggingface.co/datasets/TIGER-Lab/ClawBench/blob/main/leaderboard/results.csv) in the dataset repo. Submit your model by opening a PR there.
| Resource | Link |
|---|---|
| πŸ“– Paper | https://arxiv.org/abs/2604.08523 |
| πŸ’» GitHub | https://github.com/reacher-z/ClawBench |
| πŸ—‚ Dataset | https://huggingface.co/datasets/TIGER-Lab/ClawBench |
| 🎞 Traces (V1) | https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace |
| 🎞 Traces (V2) | https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace |
| 🌐 Website | https://claw-bench.com |