leaderboard / docs /huggingface_space_deploy.md
hotchpotch's picture
Deploy benchmark task docs viewer
c9ec30f verified
# Hugging Face Space Deployment
This document describes how to deploy the HAKARI-Bench leaderboard viewer to
the Docker Space at `hakari-bench/leaderboard` and how to publish the DuckDB
database consumed by that Space.
## Repositories
- Space: `hakari-bench/leaderboard`
- DuckDB dataset: `hakari-bench/leaderboard_database`
- DuckDB path in dataset: `duckdb/hakari_bench.duckdb`
- Public Space URL: `https://hakari-bench-leaderboard.hf.space/`
The Space runs the FastAPI viewer from `hakari_bench.viewer.space:create_space_app`
and downloads the DuckDB file dynamically from the dataset repo. The database is
cached inside the Space at `/data/viewer/hakari_bench.duckdb`.
## Required Permissions
The Hugging Face token used for deployment must be able to:
- Read and write repos under the `hakari-bench` organization.
- Create or update a dataset repo for `hakari-bench/leaderboard_database`.
- Create or update a Docker Space for `hakari-bench/leaderboard`.
- Change Space visibility if the Space needs to be public.
- Read the DuckDB dataset from the running Space. Public datasets work without a
Space secret; private datasets require an `HF_TOKEN` Space secret.
The current deployment was performed with an authenticated Hugging Face account
that is an admin of the `hakari-bench` organization and has fine-grained
permissions including organization read/write, repo access read, repo content
read, and repo write for `hakari-bench`.
Verify authentication locally:
```bash
uv run --group viewer python - <<'PY'
from huggingface_hub import HfApi
print(HfApi().whoami())
PY
```
## Publish the DuckDB Database
Choose the viewer DuckDB file to publish. For example:
```bash
export DUCKDB_PATH=/home/hotchpotch/src/github.com/hakari-bench/hakari-bench-wt/recreate_nano_datasets/output/viewer_combined_20260510_1340/hakari_bench.duckdb
```
Upload it to the dataset repo:
```bash
uv run --group viewer python - <<'PY'
import os
from pathlib import Path
from huggingface_hub import HfApi
duckdb_path = Path(os.environ["DUCKDB_PATH"])
api = HfApi()
api.create_repo(
repo_id="hakari-bench/leaderboard_database",
repo_type="dataset",
exist_ok=True,
private=False,
)
info = api.upload_file(
repo_id="hakari-bench/leaderboard_database",
repo_type="dataset",
path_or_fileobj=str(duckdb_path),
path_in_repo="duckdb/hakari_bench.duckdb",
commit_message="Update leaderboard DuckDB",
)
print(info)
PY
```
Verify the uploaded file can be downloaded:
```bash
uv run --group viewer python - <<'PY'
from hakari_bench.viewer.store import HuggingFaceDuckDbSource, _download_hf_duckdb
path = _download_hf_duckdb(
HuggingFaceDuckDbSource(
repo_id="hakari-bench/leaderboard_database",
filename="duckdb/hakari_bench.duckdb",
)
)
print(path)
print(path.exists(), path.stat().st_size)
PY
```
## Deploy the Docker Space
The Space uses:
- `Dockerfile`
- `.dockerignore`
- `README.md` Space metadata with `sdk: docker`
- `hakari_bench/viewer/space.py`
- `hakari_bench/viewer/assets/` for local CSS, viewer JavaScript, HTMX, and
favicon assets
- `docs/benchmark_tasks/` for benchmark group and task documentation rendered
by the viewer
The Docker image installs only viewer runtime dependencies and starts:
```bash
uvicorn hakari_bench.viewer.space:create_space_app --factory --host 0.0.0.0 --port ${PORT:-7860}
```
The Space Dockerfile pins the Python and `uv` image references by digest and
runs the FastAPI process as the non-root `hakari` user. If those base images are
intentionally updated, refresh the digest pins in `Dockerfile`, rebuild locally,
and keep this deployment note in sync.
The viewer also adds response security headers at runtime, including a
Content-Security-Policy. The default `frame-ancestors` allows Hugging Face
embedding hosts:
```text
https://huggingface.co https://*.huggingface.co
```
If Hugging Face introduces another parent origin or a private deployment needs a
different embedding origin, set `HAKARI_BENCH_VIEWER_FRAME_ANCESTORS` to a
space-separated list of allowed origins.
The viewer does not depend on CDN-hosted browser assets. Regenerate the local
Tailwind CSS before deploying when viewer templates or styles change:
```bash
npx --yes tailwindcss@3.4.17 \
-i hakari_bench/viewer/assets/app.tailwind.css \
-o hakari_bench/viewer/assets/app.css \
--content 'hakari_bench/viewer/**/*.py,tests/**/*.py' \
--minify
```
Upload the current workspace to the Space:
```bash
uv run --group viewer python - <<'PY'
from pathlib import Path
from huggingface_hub import HfApi
api = HfApi()
api.create_repo(
repo_id="hakari-bench/leaderboard",
repo_type="space",
space_sdk="docker",
exist_ok=True,
private=False,
)
info = api.upload_folder(
repo_id="hakari-bench/leaderboard",
repo_type="space",
folder_path=Path("."),
commit_message="Deploy Docker leaderboard viewer",
ignore_patterns=[
".git",
".git/**",
".venv/**",
"__pycache__/**",
".pytest_cache/**",
".ruff_cache/**",
".tox/**",
"output/**",
"tmp/**",
],
delete_patterns=[
"output/**",
"tmp/**",
],
)
print(info)
PY
```
If the Space was created private or needs public access, update visibility:
```bash
uv run hf repos settings hakari-bench/leaderboard --repo-type space --public
```
## Runtime Configuration
The Dockerfile sets these defaults:
```text
HAKARI_BENCH_VIEWER_DATA_DIR=/data/viewer
HAKARI_BENCH_VIEWER_HF_DATASET_REPO_ID=hakari-bench/leaderboard_database
HAKARI_BENCH_VIEWER_HF_DATASET_PATH=duckdb/hakari_bench.duckdb
```
When the viewer uses the Hugging Face dataset source, it checks/downloads the
DuckDB source at startup and then caches the source check for 10 minutes per
running process. Requests served within that window use the already-local
`/data/viewer/hakari_bench.duckdb` file and do not call `hf_hub_download()`.
The same viewer can be pointed at a different source locally or in a Space with:
- `HAKARI_BENCH_VIEWER_DUCKDB_PATH`
- `HAKARI_BENCH_VIEWER_SOURCE_DUCKDB_PATH`
- `HAKARI_BENCH_VIEWER_SOURCE_RESULTS_DIR`
- `HAKARI_BENCH_VIEWER_HF_DATASET_REPO_ID`
- `HAKARI_BENCH_VIEWER_HF_DATASET_PATH`
- `HAKARI_BENCH_VIEWER_HF_DATASET_REVISION`
- `HAKARI_BENCH_VIEWER_FRAME_ANCESTORS`
## URL State in Embedded Spaces
Hugging Face propagates the parent Space page query string and hash to the
embedded `*.hf.space` application on initial load. The viewer supports both URL
forms:
- Query strings such as `?view=All&target=reranking` are handled by the
server as before.
- Hash parameters such as `#view=All&target=reranking` are merged into the
first HTMX leaderboard request before it loads, which keeps deep links working
inside embedded Space URLs.
After HTMX changes the leaderboard state, the viewer sends a best-effort
`window.parent.postMessage` update to `https://huggingface.co` with the state in
the hash and an empty query string. This keeps the parent Hugging Face Space URL
shareable while avoiding duplicate query and hash state.
## Verification
Wait for the deployed Space commit to become `RUNNING`:
```bash
uv run --group viewer python - <<'PY'
import time
from huggingface_hub import HfApi
api = HfApi()
target_sha = "REPLACE_WITH_SPACE_COMMIT_SHA"
for _ in range(30):
runtime = api.get_space_runtime("hakari-bench/leaderboard")
sha = runtime.raw.get("sha") if runtime.raw else None
print(runtime.stage, sha, flush=True)
if runtime.stage == "RUNNING" and sha == target_sha:
break
time.sleep(5)
PY
```
Check the public endpoints:
```bash
curl -L -sS https://hakari-bench-leaderboard.hf.space/ | rg "HAKARI-bench leaderboard|/assets/app.css|/assets/viewer.js|/assets/favicon.png|/assets/htmx.min.js"
curl -L -sS 'https://hakari-bench-leaderboard.hf.space/leaderboard?view=All' | rg "Core benchmarks|NanoMMTEB-v2|Language pages"
curl -L -sS -D - https://hakari-bench-leaderboard.hf.space/assets/favicon.png -o /tmp/hakari_favicon.png
curl -L -sS -D - https://hakari-bench-leaderboard.hf.space/assets/app.css -o /tmp/hakari_app.css
curl -L -sS -D - https://hakari-bench-leaderboard.hf.space/assets/viewer.js -o /tmp/hakari_viewer.js
curl -L -sS -D - https://hakari-bench-leaderboard.hf.space/assets/htmx.min.js -o /tmp/hakari_htmx.min.js
```
Check logs if the Space is not serving traffic:
```bash
uv run hf spaces logs hakari-bench/leaderboard -n 100
uv run hf spaces logs hakari-bench/leaderboard --build -n 100
```
Before considering a deploy complete, run local validation:
```bash
uv run tox
```
The viewer also has a Playwright-backed browser smoke test for the static
JavaScript, HTMX load path, tooltip layer, and model details modal. Install the
Chromium browser once for the local environment, then run the test through `uv`:
```bash
uv run playwright install chromium
uv run --group all pytest -q tests/test_viewer_browser.py
```
CI is configured to run only on pull requests or explicit manual dispatches.
The browser smoke test has its own lightweight dependency group so the browser
job does not need to install the full benchmark/runtime stack:
```bash
uv sync --only-group viewer-browser-test --frozen
uv run --only-group viewer-browser-test playwright install --with-deps chromium
uv run --only-group viewer-browser-test pytest -q -m browser tests/test_viewer_browser.py
```