File size: 2,280 Bytes
0425205
 
 
 
 
 
 
 
ad6901d
 
0425205
ad6901d
0425205
 
 
d094faf
0425205
d094faf
0425205
d094faf
0425205
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: GraphTestbed Scoring API
emoji: πŸ“Š
colorFrom: indigo
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---

# GraphTestbed Scoring API

Public scoring server for the [GraphTestbed](https://github.com/zhuconv/GraphTestbed)
benchmark. Anyone can `gtb submit <task> --file preds.csv --agent <name>` from
anywhere; the scored entry lands on a single shared leaderboard.

## Endpoints

| method | path | purpose |
| --- | --- | --- |
| POST | `/submit` | multipart `task=…&agent=…&file=preds.csv` β†’ JSON with primary metric, secondary metrics, leaderboard rank, quota_remaining |
| GET | `/leaderboard/<task>` | best-per-agent JSON, sorted by primary desc |
| GET | `/healthz` | tasks list + which have GT loaded + quota |

Full contract: [PROTOCOL.md](https://github.com/zhuconv/GraphTestbed/blob/main/PROTOCOL.md).

## Trust model

Non-adversarial benchmark. The API enforces:
- 5 submissions / day / IP / task
- Schema check before scoring (malformed CSVs don't burn quota)
- Score bucketing (round to 3 dp)
- Audit trail in sqlite + per-submission CSV archive

Test labels live only in the companion private dataset repo
(`lanczos/graphtestbed-gt`) and never enter the Space's git history.

## Configuration (Space secrets)

| name | required | default | notes |
| --- | --- | --- | --- |
| `HF_TOKEN` | yes | β€” | write scope on `GT_DATASET_REPO` |
| `GT_DATASET_REPO` | no | `lanczos/graphtestbed-gt` | private dataset holding GT + leaderboard backups |
| `GT_BACKUP_INTERVAL` | no | `60` | seconds between sqlite β†’ dataset-repo pushes |
| `GT_QUOTA` | no | `5` | submissions/day/IP/task |
| `GT_BYPASS_KEY` | no | β€” | shared secret; clients sending it as `X-Bypass-Key` header skip quota and may pass `dry=1` to score without inserting |

## Persistence

- On boot: `snapshot_download` pulls `gt/*.csv`, `leaderboard.db`, and any
  archived `submissions/**/*.csv` from the dataset repo into `/data`.
- Every 60 s: if `SELECT COUNT(*) FROM submissions` grew, a daemon thread
  uses `sqlite3.Connection.backup()` to copy the DB atomically and
  `upload_file`s it back. New submission CSVs in `/data/submissions/` are
  pushed via `upload_folder` (content-hash diff β€” unchanged files skipped).
- Worst-case loss on Space crash: 60 s of submissions.