File size: 1,108 Bytes
dbb3bdb
41e181d
 
 
 
dbb3bdb
41e181d
dbb3bdb
41e181d
 
 
 
50a75ee
41e181d
 
 
 
 
 
dbb3bdb
 
41e181d
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
title: ClawBench Leaderboard
emoji: πŸ¦€
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: Can AI agents complete everyday online tasks?
tags:
  - arxiv:2604.08523
  - leaderboard
  - benchmark
  - web-agents
  - browser-automation
  - agent-evaluation
  - llm-evaluation
---

# ClawBench β€” Leaderboard

Live results for the [ClawBench](https://huggingface.co/datasets/TIGER-Lab/ClawBench) web-agent benchmark β€” backed by [`leaderboard/results.csv`](https://huggingface.co/datasets/TIGER-Lab/ClawBench/blob/main/leaderboard/results.csv) in the dataset repo. Submit your model by opening a PR there.

| Resource | Link |
|---|---|
| πŸ“– Paper | https://arxiv.org/abs/2604.08523 |
| πŸ’» GitHub | https://github.com/reacher-z/ClawBench |
| πŸ—‚ Dataset | https://huggingface.co/datasets/TIGER-Lab/ClawBench |
| 🎞 Traces (V1) | https://huggingface.co/datasets/NAIL-Group/ClawBenchV1Trace |
| 🎞 Traces (V2) | https://huggingface.co/datasets/TIGER-Lab/ClawBenchV2Trace |
| 🌐 Website | https://claw-bench.com |