Spaces:

TIGER-Lab
/

ClawBench

Running

App Files Files Community

ClawBench / README.md

AgPerry

Add arxiv:2604.08523 tag for HF Papers auto-linking

50a75ee verified about 16 hours ago

preview code

raw

history blame contribute delete

1.11 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: ClawBench Leaderboard
emoji: 🦀
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: Can AI agents complete everyday online tasks?
tags:
  - arxiv:2604.08523
  - leaderboard
  - benchmark
  - web-agents
  - browser-automation
  - agent-evaluation
  - llm-evaluation

ClawBench — Leaderboard

Live results for the ClawBench web-agent benchmark — backed by leaderboard/results.csv in the dataset repo. Submit your model by opening a PR there.

Resource	Link
📖 Paper	https://arxiv.org/abs/2604.08523
💻 GitHub	https://github.com/reacher-z/ClawBench
🗂 Dataset	https://huggingface.co/datasets/TIGER-Lab/ClawBench
🎞 Traces (V1)	https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace
🎞 Traces (V2)	https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace
🌐 Website	https://claw-bench.com