Spaces:

TIGER-Lab
/

ClawBench

Running

File size: 1,108 Bytes

dbb3bdb
41e181d
 
 
 
dbb3bdb
41e181d
dbb3bdb
41e181d
 
 
 
50a75ee
41e181d
 
 
 
 
 
dbb3bdb
 
41e181d

---
title: ClawBench Leaderboard
emoji: 🦀
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: Can AI agents complete everyday online tasks?
tags:
  - arxiv:2604.08523
  - leaderboard
  - benchmark
  - web-agents
  - browser-automation
  - agent-evaluation
  - llm-evaluation
---

# ClawBench — Leaderboard

Live results for the [ClawBench](https://huggingface.co/datasets/TIGER-Lab/ClawBench) web-agent benchmark — backed by [`leaderboard/results.csv`](https://huggingface.co/datasets/TIGER-Lab/ClawBench/blob/main/leaderboard/results.csv) in the dataset repo. Submit your model by opening a PR there.

| Resource | Link |
|---|---|
| 📖 Paper | https://arxiv.org/abs/2604.08523 |
| 💻 GitHub | https://github.com/reacher-z/ClawBench |
| 🗂 Dataset | https://huggingface.co/datasets/TIGER-Lab/ClawBench |
| 🎞 Traces (V1) | https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace |
| 🎞 Traces (V2) | https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace |
| 🌐 Website | https://claw-bench.com |