graphtestbed / server /space /DEPLOY.md
Zhu Jiajun (jz28583)
Add agents/ harness integrations and HF Space scoring deployment
d094faf

Deploying the GraphTestbed scoring server to HF Spaces

All commands assume HF_TOKEN is exported and has write scope on the lanczos namespace.

1. Seed the GT dataset repo

HF_TOKEN=$HF_TOKEN python server/space/push_gt.py \
    --repo lanczos/graphtestbed-gt \
    --gt-dir ~/graphtestbed-gt

This creates the private dataset repo if it doesn't exist and uploads each <task>.csv to gt/<task>.csv. Verify at:

https://huggingface.co/datasets/lanczos/graphtestbed-gt

2. Create the Space

huggingface-cli repo create graphtestbed --type space --space_sdk docker

Or in the web UI: New Space → name graphtestbed → SDK: Docker.

3. Set the Space secret

In Space Settings → Variables and secrets, add:

name value
HF_TOKEN same token (write scope on lanczos/graphtestbed-gt)

Optional overrides (set as variables, not secrets):

name default when to override
GT_DATASET_REPO lanczos/graphtestbed-gt running multiple Spaces against different GT
GT_BACKUP_INTERVAL 60 tighter durability vs. fewer commits
GT_QUOTA 5 bumping during a benchmark sprint

4. Push the code to the Space

# One-time
git remote add space https://huggingface.co/spaces/lanczos/graphtestbed

# Each deploy (HF prompts for credentials: user=lanczos, password=$HF_TOKEN)
./server/space/push_to_space.sh

The script overlays server/space/README.md at repo root on a temp branch and force-pushes to space/main (HF reads its frontmatter from root README). Your GitHub root README is untouched.

First build ~3 min (pandas + sklearn wheels). Subsequent ~30 s.

5. Smoke-test

curl -s https://lanczos-graphtestbed.hf.space/healthz | jq

Expect:

{
  "status": "ok",
  "tasks": ["arxiv-citation", "figraph", "ibm-aml", "ieee-fraud-detection"],
  "gt_present": ["figraph", "..."],
  "quota_per_day": 5,
  "uptime_unix": 1776633751
}

If gt_present is empty, the boot bootstrap couldn't read from the dataset repo — check the Space logs and verify HF_TOKEN has read scope on GT_DATASET_REPO.

6. Hand out the URL

export GRAPHTESTBED_API=https://lanczos-graphtestbed.hf.space
gtb submit figraph --file preds.csv --agent my-agent-v1

Reading the leaderboard back as a maintainer

huggingface-cli download lanczos/graphtestbed-gt \
    leaderboard.db \
    --repo-type dataset \
    --local-dir ./backup

sqlite3 backup/leaderboard.db \
    "SELECT task, agent, primary_metric, n_rows, submitted_at
     FROM submissions ORDER BY submitted_at DESC LIMIT 20"

The full per-submission CSV archive lives under submissions/<task>/<agent>-<run_id>.csv in the same dataset repo.