--- title: ClawBench Leaderboard emoji: 🦀 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 5.15.0 app_file: app.py pinned: true license: apache-2.0 short_description: Can AI agents complete everyday online tasks? tags: - arxiv:2604.08523 - leaderboard - benchmark - web-agents - browser-automation - agent-evaluation - llm-evaluation --- # ClawBench — Leaderboard Live results for the [ClawBench](https://huggingface.co/datasets/TIGER-Lab/ClawBench) web-agent benchmark — backed by [`leaderboard/results.csv`](https://huggingface.co/datasets/TIGER-Lab/ClawBench/blob/main/leaderboard/results.csv) in the dataset repo. Submit your model by opening a PR there. | Resource | Link | |---|---| | 📖 Paper | https://arxiv.org/abs/2604.08523 | | 💻 GitHub | https://github.com/reacher-z/ClawBench | | 🗂 Dataset | https://huggingface.co/datasets/TIGER-Lab/ClawBench | | 🎞 Traces (V1) | https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace | | 🎞 Traces (V2) | https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace | | 🌐 Website | https://claw-bench.com |