# Deploying the GraphTestbed scoring API The scoring server is a single Flask app (`api.py`). Pick any host; the canonical setup below uses a small VM but the app is deliberately thin so HuggingFace Spaces, fly.io, or render.com all work. ## Prerequisites on the host - Python ≥ 3.10 - `~50 MB` for code + sqlite leaderboard - `~5 GB` if hosting all 4 ground-truth CSVs locally - Public HTTPS endpoint (a reverse proxy with TLS or a managed service) ## Layout on the host ``` /opt/graphtestbed/ ├── server/ # this directory, deployed from `server` branch │ ├── api.py │ ├── requirements.txt │ └── deploy.md ├── datasets/manifest.yaml # pulled from `main` branch (read-only by api.py) └── .venv/ /var/graphtestbed/ ├── gt/ # NOT IN GIT — copied here separately │ ├── ieee-fraud-detection.csv │ ├── arxiv-citation.csv │ ├── figraph.csv │ └── ibm-aml.csv └── leaderboard.db # sqlite, created by api.py on first run ``` ## Branch deployment pattern ```bash # On the host, clone twice into adjacent dirs: git clone /opt/graphtestbed/_main && \ cd /opt/graphtestbed/_main && \ cp -r datasets /opt/graphtestbed/ git clone -b server /opt/graphtestbed/_server && \ cp -r /opt/graphtestbed/_server/server /opt/graphtestbed/ # Place ground-truth files (NOT in git): sudo mkdir -p /var/graphtestbed/gt sudo scp ieee-fraud-detection.csv \ arxiv-citation.csv \ figraph.csv \ ibm-aml.csv \ host:/var/graphtestbed/gt/ ``` ## Run ```bash cd /opt/graphtestbed/server python -m venv ../.venv && source ../.venv/bin/activate pip install -r requirements.txt export GT_DIR=/var/graphtestbed/gt export GT_DB=/var/graphtestbed/leaderboard.db export GT_MANIFEST=/opt/graphtestbed/datasets/manifest.yaml export GT_QUOTA=5 export PORT=8080 # Dev mode: python api.py # Production: gunicorn --bind 0.0.0.0:8080 --workers 2 api:app ``` Front it with nginx (or use a managed proxy like Cloudflare Tunnel / fly.io's built-in TLS). The app speaks plain HTTP on $PORT. ## Updating ground truth GT files are append-only: never edit, never delete. To version a dataset, add a new task entry like `arxiv-citation-v2` in `datasets/manifest.yaml` (on the `main` branch) and place a new GT file `arxiv-citation-v2.csv` on the host. Old leaderboard for v1 stays valid; new submissions go to v2. ## Healthcheck ```bash curl https:///healthz # { # "status": "ok", # "tasks": ["ieee-fraud-detection", "arxiv-citation", "figraph", "ibm-aml"], # "gt_present": ["figraph", "arxiv-citation"], # only those uploaded so far # "quota_per_day": 5, # "uptime_unix": 1745081234 # } ``` If a task is in `tasks` but missing from `gt_present`, the server will reject submissions for it with 503. ## Costs - HuggingFace Space (free, sleeps when idle, ~30s cold start): $0 - fly.io (always-on shared-cpu-1x, 256MB): ~$2/month - self-hosted VM (1 vCPU, 1GB): ~$5/month The sqlite leaderboard handles thousands of submissions on commodity hardware. If you outgrow it, swap `_db()` for postgres without touching the rest of `api.py`. ## Backups The leaderboard sqlite at `$GT_DB` is a single file — copy it for backup. Submission CSVs themselves are not persisted by the server (only their sha256 + agent + timestamp). If you want full submission archival, set up your own object store and have `api.py` write to it before scoring.