graphtestbed / README.md
Zhu Jiajun (jz28583)
deploy: overlay server/space/{README,Dockerfile} at root
0425205
metadata
title: GraphTestbed Scoring API
emoji: πŸ“Š
colorFrom: indigo
colorTo: green
sdk: docker
app_port: 7860
pinned: false

GraphTestbed Scoring API

Public scoring server for the GraphTestbed benchmark. Anyone can gtb submit <task> --file preds.csv --agent <name> from anywhere; the scored entry lands on a single shared leaderboard.

Endpoints

method path purpose
POST /submit multipart task=…&agent=…&file=preds.csv β†’ JSON with primary metric, secondary metrics, leaderboard rank, quota_remaining
GET /leaderboard/<task> best-per-agent JSON, sorted by primary desc
GET /healthz tasks list + which have GT loaded + quota

Full contract: PROTOCOL.md.

Trust model

Non-adversarial benchmark. The API enforces:

  • 5 submissions / day / IP / task
  • Schema check before scoring (malformed CSVs don't burn quota)
  • Score bucketing (round to 3 dp)
  • Audit trail in sqlite + per-submission CSV archive

Test labels live only in the companion private dataset repo (lanczos/graphtestbed-gt) and never enter the Space's git history.

Configuration (Space secrets)

name required default notes
HF_TOKEN yes β€” write scope on GT_DATASET_REPO
GT_DATASET_REPO no lanczos/graphtestbed-gt private dataset holding GT + leaderboard backups
GT_BACKUP_INTERVAL no 60 seconds between sqlite β†’ dataset-repo pushes
GT_QUOTA no 5 submissions/day/IP/task
GT_BYPASS_KEY no β€” shared secret; clients sending it as X-Bypass-Key header skip quota and may pass dry=1 to score without inserting

Persistence

  • On boot: snapshot_download pulls gt/*.csv, leaderboard.db, and any archived submissions/**/*.csv from the dataset repo into /data.
  • Every 60 s: if SELECT COUNT(*) FROM submissions grew, a daemon thread uses sqlite3.Connection.backup() to copy the DB atomically and upload_files it back. New submission CSVs in /data/submissions/ are pushed via upload_folder (content-hash diff β€” unchanged files skipped).
  • Worst-case loss on Space crash: 60 s of submissions.