graphtestbed / server /space /DEPLOY.md
Zhu Jiajun (jz28583)
Add agents/ harness integrations and HF Space scoring deployment
d094faf
# Deploying the GraphTestbed scoring server to HF Spaces
All commands assume `HF_TOKEN` is exported and has **write** scope on the
`lanczos` namespace.
## 1. Seed the GT dataset repo
```bash
HF_TOKEN=$HF_TOKEN python server/space/push_gt.py \
--repo lanczos/graphtestbed-gt \
--gt-dir ~/graphtestbed-gt
```
This creates the **private** dataset repo if it doesn't exist and uploads
each `<task>.csv` to `gt/<task>.csv`. Verify at:
<https://huggingface.co/datasets/lanczos/graphtestbed-gt>
## 2. Create the Space
```bash
huggingface-cli repo create graphtestbed --type space --space_sdk docker
```
Or in the web UI: New Space β†’ name `graphtestbed` β†’ SDK: **Docker**.
## 3. Set the Space secret
In Space Settings β†’ Variables and secrets, add:
| name | value |
| --- | --- |
| `HF_TOKEN` | same token (write scope on `lanczos/graphtestbed-gt`) |
Optional overrides (set as **variables**, not secrets):
| name | default | when to override |
| --- | --- | --- |
| `GT_DATASET_REPO` | `lanczos/graphtestbed-gt` | running multiple Spaces against different GT |
| `GT_BACKUP_INTERVAL` | `60` | tighter durability vs. fewer commits |
| `GT_QUOTA` | `5` | bumping during a benchmark sprint |
## 4. Push the code to the Space
```bash
# One-time
git remote add space https://huggingface.co/spaces/lanczos/graphtestbed
# Each deploy (HF prompts for credentials: user=lanczos, password=$HF_TOKEN)
./server/space/push_to_space.sh
```
The script overlays `server/space/README.md` at repo root on a temp branch
and force-pushes to `space/main` (HF reads its frontmatter from root
README). Your GitHub root README is untouched.
First build ~3 min (pandas + sklearn wheels). Subsequent ~30 s.
## 5. Smoke-test
```bash
curl -s https://lanczos-graphtestbed.hf.space/healthz | jq
```
Expect:
```json
{
"status": "ok",
"tasks": ["arxiv-citation", "figraph", "ibm-aml", "ieee-fraud-detection"],
"gt_present": ["figraph", "..."],
"quota_per_day": 5,
"uptime_unix": 1776633751
}
```
If `gt_present` is empty, the boot bootstrap couldn't read from the dataset
repo β€” check the Space logs and verify `HF_TOKEN` has read scope on
`GT_DATASET_REPO`.
## 6. Hand out the URL
```
export GRAPHTESTBED_API=https://lanczos-graphtestbed.hf.space
gtb submit figraph --file preds.csv --agent my-agent-v1
```
## Reading the leaderboard back as a maintainer
```bash
huggingface-cli download lanczos/graphtestbed-gt \
leaderboard.db \
--repo-type dataset \
--local-dir ./backup
sqlite3 backup/leaderboard.db \
"SELECT task, agent, primary_metric, n_rows, submitted_at
FROM submissions ORDER BY submitted_at DESC LIMIT 20"
```
The full per-submission CSV archive lives under `submissions/<task>/<agent>-<run_id>.csv`
in the same dataset repo.