File size: 3,522 Bytes
ad6901d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# Deploying the GraphTestbed scoring API

The scoring server is a single Flask app (`api.py`). Pick any host; the
canonical setup below uses a small VM but the app is deliberately thin so
HuggingFace Spaces, fly.io, or render.com all work.

## Prerequisites on the host

- Python β‰₯ 3.10
- `~50 MB` for code + sqlite leaderboard
- `~5 GB` if hosting all 4 ground-truth CSVs locally
- Public HTTPS endpoint (a reverse proxy with TLS or a managed service)

## Layout on the host

```
/opt/graphtestbed/
β”œβ”€β”€ server/                 # this directory, deployed from `server` branch
β”‚   β”œβ”€β”€ api.py
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── deploy.md
β”œβ”€β”€ datasets/manifest.yaml  # pulled from `main` branch (read-only by api.py)
└── .venv/

/var/graphtestbed/
β”œβ”€β”€ gt/                     # NOT IN GIT β€” copied here separately
β”‚   β”œβ”€β”€ ieee-fraud-detection.csv
β”‚   β”œβ”€β”€ arxiv-citation.csv
β”‚   β”œβ”€β”€ figraph.csv
β”‚   └── ibm-aml.csv
└── leaderboard.db          # sqlite, created by api.py on first run
```

## Branch deployment pattern

```bash
# On the host, clone twice into adjacent dirs:
git clone <repo> /opt/graphtestbed/_main && \
  cd /opt/graphtestbed/_main && \
  cp -r datasets /opt/graphtestbed/

git clone -b server <repo> /opt/graphtestbed/_server && \
  cp -r /opt/graphtestbed/_server/server /opt/graphtestbed/

# Place ground-truth files (NOT in git):
sudo mkdir -p /var/graphtestbed/gt
sudo scp ieee-fraud-detection.csv \
         arxiv-citation.csv \
         figraph.csv \
         ibm-aml.csv \
         host:/var/graphtestbed/gt/
```

## Run

```bash
cd /opt/graphtestbed/server
python -m venv ../.venv && source ../.venv/bin/activate
pip install -r requirements.txt

export GT_DIR=/var/graphtestbed/gt
export GT_DB=/var/graphtestbed/leaderboard.db
export GT_MANIFEST=/opt/graphtestbed/datasets/manifest.yaml
export GT_QUOTA=5
export PORT=8080

# Dev mode:
python api.py

# Production:
gunicorn --bind 0.0.0.0:8080 --workers 2 api:app
```

Front it with nginx (or use a managed proxy like Cloudflare Tunnel / fly.io's
built-in TLS). The app speaks plain HTTP on $PORT.

## Updating ground truth

GT files are append-only: never edit, never delete. To version a dataset, add
a new task entry like `arxiv-citation-v2` in `datasets/manifest.yaml` (on
the `main` branch) and place a new GT file `arxiv-citation-v2.csv` on the
host. Old leaderboard for v1 stays valid; new submissions go to v2.

## Healthcheck

```bash
curl https://<host>/healthz
# {
#   "status": "ok",
#   "tasks": ["ieee-fraud-detection", "arxiv-citation", "figraph", "ibm-aml"],
#   "gt_present": ["figraph", "arxiv-citation"],   # only those uploaded so far
#   "quota_per_day": 5,
#   "uptime_unix": 1745081234
# }
```

If a task is in `tasks` but missing from `gt_present`, the server will reject
submissions for it with 503.

## Costs

- HuggingFace Space (free, sleeps when idle, ~30s cold start): $0
- fly.io (always-on shared-cpu-1x, 256MB): ~$2/month
- self-hosted VM (1 vCPU, 1GB): ~$5/month

The sqlite leaderboard handles thousands of submissions on commodity hardware.
If you outgrow it, swap `_db()` for postgres without touching the rest of
`api.py`.

## Backups

The leaderboard sqlite at `$GT_DB` is a single file β€” copy it for backup.
Submission CSVs themselves are not persisted by the server (only their
sha256 + agent + timestamp). If you want full submission archival, set up
your own object store and have `api.py` write to it before scoring.