Spaces:

Bani57
/

website

Sleeping

Andrej Janchevski commited on 23 days ago

Commit

5ed6f37

1 Parent(s): 5375c2e

docs(deploy): refresh for the post-launch deployment iteration

Captures the architecture and operational decisions made during the
local Docker bring-up: the unified RESEARCH_ROOT / CHECKPOINTS_ROOT
layout, the single preloaded gunicorn worker, the AppConfig skip-list
for management commands, the share_memory and TransE-init dim-
expansion monkey-patches around experiment.prepare(), the MultiProx
symmetry safeguard, the bundled Loader caches that keep cold-start
RAM under control, and the WSL2 / shm_size requirements for running
the container locally.

Also corrects every HF Hub reference to the case-sensitive Bani57
namespace (account is registered as "Bani57", not "bani57"), and
clarifies the HF_TOKEN env var as recommended (read-scope token
lifts anonymous rate limits and roughly triples cold-start download
throughput; required only for private repos).

Files changed (11) hide show

.claude/plans/deploy_huggingface_spaces.md +16 -16
CLAUDE.md +3 -3
README.md +1 -1
docs/README.md +1 -1
docs/explanation/architecture.md +18 -14
docs/explanation/inference-lifecycle.md +33 -5
docs/glossary.md +2 -2
docs/guides/deploy.md +7 -7
docs/guides/local-development.md +10 -2
docs/reference/backend-services.md +10 -1
src/backend/README.md +10 -9

.claude/plans/deploy_huggingface_spaces.md CHANGED Viewed

@@ -12,8 +12,8 @@ We pivot to **Hugging Face Spaces with the Docker SDK**:
 - Free, no credit card. 16 GB RAM, 2 vCPU, 50 GB ephemeral disk, 7860 default port.
 - Native Docker — we provide a `Dockerfile`, HF builds and runs it.
 - Purpose-built for ML demos (the very thing this site is).
-- Checkpoints move from Google Drive to a free HF Hub model repo (`bani57/checkpoints`), served by HF's CDN — much faster than `gdown`, no quota games, resumable uploads/downloads.
-- Space is `bani57/website` → public URL `https://bani57-website.hf.space`. (HF Spaces URLs are always `<owner>-<spacename>.hf.space`; a bare `bani57.hf.space` isn't possible.)
 - A single container serves both the API (Django/Gunicorn) and the built Vue SPA (via WhiteNoise) on the same origin → no CORS, one URL.
 Trade-off accepted: free Spaces sleep after 48 h idle (first hit wakes it, ~2 min cold start because checkpoints are re-pulled from HF Hub on a fresh ephemeral disk). \$5/mo Persistent Storage upgrade eliminates this if it ever bothers us.
@@ -22,7 +22,7 @@ Trade-off accepted: free Spaces sleep after 48 h idle (first hit wakes it, ~2 mi
 ```
                   ┌────────────────────────────────────┐
-                  │  HF Space:  bani57/website         │
   Browser ─HTTPS─▶│  https://bani57-website.hf.space   │
                   │  (Docker container, port 7860)     │
                   │                                    │
@@ -38,7 +38,7 @@ Trade-off accepted: free Spaces sleep after 48 h idle (first hit wakes it, ~2 mi
                   │   ▲ snapshot_download() at boot    │
                   └───┼────────────────────────────────┘
                       │
-                  HF Hub model repo: bani57/checkpoints
                   (free, unlimited, CDN-backed, resumable)
 ```
@@ -50,11 +50,11 @@ Trade-off accepted: free Spaces sleep after 48 h idle (first hit wakes it, ~2 mi
 - `docker-compose.yml` (repo root) — local dev convenience only; HF Spaces ignores it and consumes only the Dockerfile.
 - `.dockerignore` — keep `node_modules`, `.git`, `src/research/**/checkpoints`, `data.zip`, `docs/`, screenshots, `*.npz`, `*.gz` out of the build context.
 - `entrypoint.sh` — runs `huggingface_hub.snapshot_download` (idempotent), then `exec gunicorn …` on `0.0.0.0:7860`.
-- `scripts/upload_checkpoints.py` (NEW) — one-shot script that walks the local research checkpoint dirs and pushes everything to `bani57/checkpoints` on HF Hub, preserving directory structure and resumable on partial failure.
 - Top-level `README.md` for the Space (a thin file with YAML front-matter HF requires; rendered as the Space card). Either commit it at repo root and let the Space repo serve as the deploy target, or maintain a small Space-only fork — see Deployment Steps for the chosen flow.
 **Modified:**
-- `src/backend/api/services/registry.py` — replace the `gdown` Google-Drive call with `huggingface_hub.snapshot_download(repo_id="bani57/checkpoints", local_dir=…)`. Keep the registry's lazy-load pattern intact. Mirror directory layout exactly so existing scan logic still finds files.
 - `src/backend/api/apps.py` — *remove* the eager checkpoint download from `AppConfig.ready()`. Downloads happen in `entrypoint.sh` *before* gunicorn starts, so workers never block.
 - `src/backend/research_api/settings.py` — add WhiteNoise to `MIDDLEWARE` (right after `SecurityMiddleware`); set `STATIC_ROOT = BASE_DIR / "staticfiles"`, `STATICFILES_STORAGE = "whitenoise.storage.CompressedManifestStaticFilesStorage"`; environment-drive `ALLOWED_HOSTS` and `CORS_ALLOWED_ORIGINS` (replace hardcoded `bani57.pythonanywhere.com` with `bani57-website.hf.space`); read `CHECKPOINTS_DIR` from env (default `/tmp/checkpoints` in container, repo path in dev).
 - `src/backend/research_api/urls.py` — add SPA shell view: `re_path(r"^(?!api/|static/).*$", spa_index_view)` returning `dist/index.html` for any non-API, non-static path. **This does not replace the Lara Croft 404** — Vue Router's existing `/:pathMatch(.*)*` route still catches unknown paths and renders `NotFoundView.vue` (Lara) client-side. Django's job ends at handing the SPA shell to the browser.
@@ -131,7 +131,7 @@ ENTRYPOINT ["/usr/local/bin/_entrypoint.sh", "/entrypoint.sh"]
 #!/bin/sh
 set -e
 python -c "from huggingface_hub import snapshot_download; \
-  snapshot_download(repo_id='bani57/checkpoints', \
                     local_dir='/tmp/checkpoints', \
                     local_dir_use_symlinks=False, \
                     max_workers=4)"
@@ -183,25 +183,25 @@ huggingface-cli login    # paste the hf_… token; it's stored in ~/.cache/huggi
 **3. Create the repo (one-time, public).**
 ```bash
 huggingface-cli repo create checkpoints --type model --yes
-# Repo URL: https://huggingface.co/bani57/checkpoints
 ```
 **4. Upload, preserving directory layout.** The remote layout matches the on-disk one exactly so the Django registry's existing scan logic finds files unchanged.
 ```bash
 # COINs graph completion (~2.4 GB)
-huggingface-cli upload bani57/checkpoints \
   src/research/COINs-KGGeneration/graph_completion/checkpoints \
   COINs-KGGeneration/graph_completion/checkpoints \
   --repo-type model
 # COINs graph generation / DIGRESS (~2.6 GB)
-huggingface-cli upload bani57/checkpoints \
   src/research/COINs-KGGeneration/graph_generation/checkpoints \
   COINs-KGGeneration/graph_generation/checkpoints \
   --repo-type model
 # MultiProxAn (~364 MB)
-huggingface-cli upload bani57/checkpoints \
   src/research/MultiProxAn/checkpoints \
   MultiProxAn/checkpoints \
   --repo-type model
@@ -209,10 +209,10 @@ huggingface-cli upload bani57/checkpoints \
 `huggingface-cli upload` chunks files and is resumable on retry. Total upload at typical home upstream (~10 MB/s) ≈ 15–20 min.
-**5. Verify.** Open <https://huggingface.co/bani57/checkpoints/tree/main>, confirm the three top-level folders. Then locally:
 ```bash
 python -c "from huggingface_hub import snapshot_download; \
-  snapshot_download(repo_id='bani57/checkpoints', local_dir='/tmp/ck-test')"
 ls -lah /tmp/ck-test
 ```
 Expect ~5.4 GB total. This is the same call the container makes at boot.
@@ -224,10 +224,10 @@ A `scripts/upload_checkpoints.py` wrapper (idempotent — uses `HfApi.upload_fol
 1. **Upload checkpoints to HF Hub** — as above.
 2. **Land the code changes** above on `master` (one PR is fine — single deployment unit).
 3. **Local smoke test**: `docker compose up --build`, hit `http://localhost:7860`, exercise each demo (KG completion, KG anomaly, MultiProxAn graph generation), verify SSE streaming and the Postman collection in `docs/postman/`. Confirm the Lara Croft page renders at `http://localhost:7860/this/is/a/wrong/path`.
-4. **Create the Space**: on huggingface.co/new-space, name `bani57/website`, SDK = Docker, hardware = CPU basic (free), visibility = public.
 5. **Push to the Space**:
    ```bash
-   git remote add hf https://huggingface.co/spaces/bani57/website
    git push hf master:main
    ```
    HF Spaces uses git on a `main` branch. First build takes ~20–30 min (mamba env solve + GPU torch + apt deps + frontend build + first checkpoint download).
@@ -235,7 +235,7 @@ A `scripts/upload_checkpoints.py` wrapper (idempotent — uses `HfApi.upload_fol
    - `DJANGO_SECRET_KEY` — generate fresh (`python -c "import secrets; print(secrets.token_urlsafe(50))"`).
    - `DJANGO_DEBUG=False`.
    - `DJANGO_ALLOWED_HOSTS=bani57-website.hf.space`.
-   - `HF_TOKEN` — only if `bani57/checkpoints` is private; for a public model repo no token is needed at runtime.
 7. **Verification on the live Space**:
    - Open `https://bani57-website.hf.space` — landing page renders, navigation works, hard-refresh on a deep route still resolves.
    - Visit `https://bani57-website.hf.space/foo/bar/quux` — Lara 404 page appears (confirms catch-all + Vue Router still cooperate in prod).

 - Free, no credit card. 16 GB RAM, 2 vCPU, 50 GB ephemeral disk, 7860 default port.
 - Native Docker — we provide a `Dockerfile`, HF builds and runs it.
 - Purpose-built for ML demos (the very thing this site is).
+- Checkpoints move from Google Drive to a free HF Hub model repo (`Bani57/checkpoints`), served by HF's CDN — much faster than `gdown`, no quota games, resumable uploads/downloads.
+- Space is `Bani57/website` → public URL `https://bani57-website.hf.space`. (HF Spaces URLs are always `<owner>-<spacename>.hf.space`; a bare `bani57.hf.space` isn't possible.)
 - A single container serves both the API (Django/Gunicorn) and the built Vue SPA (via WhiteNoise) on the same origin → no CORS, one URL.
 Trade-off accepted: free Spaces sleep after 48 h idle (first hit wakes it, ~2 min cold start because checkpoints are re-pulled from HF Hub on a fresh ephemeral disk). \$5/mo Persistent Storage upgrade eliminates this if it ever bothers us.
 ```
                   ┌────────────────────────────────────┐
+                  │  HF Space:  Bani57/website         │
   Browser ─HTTPS─▶│  https://bani57-website.hf.space   │
                   │  (Docker container, port 7860)     │
                   │                                    │
                   │   ▲ snapshot_download() at boot    │
                   └───┼────────────────────────────────┘
                       │
+                  HF Hub model repo: Bani57/checkpoints
                   (free, unlimited, CDN-backed, resumable)
 ```
 - `docker-compose.yml` (repo root) — local dev convenience only; HF Spaces ignores it and consumes only the Dockerfile.
 - `.dockerignore` — keep `node_modules`, `.git`, `src/research/**/checkpoints`, `data.zip`, `docs/`, screenshots, `*.npz`, `*.gz` out of the build context.
 - `entrypoint.sh` — runs `huggingface_hub.snapshot_download` (idempotent), then `exec gunicorn …` on `0.0.0.0:7860`.
+- `scripts/upload_checkpoints.py` (NEW) — one-shot script that walks the local research checkpoint dirs and pushes everything to `Bani57/checkpoints` on HF Hub, preserving directory structure and resumable on partial failure.
 - Top-level `README.md` for the Space (a thin file with YAML front-matter HF requires; rendered as the Space card). Either commit it at repo root and let the Space repo serve as the deploy target, or maintain a small Space-only fork — see Deployment Steps for the chosen flow.
 **Modified:**
+- `src/backend/api/services/registry.py` — replace the `gdown` Google-Drive call with `huggingface_hub.snapshot_download(repo_id="Bani57/checkpoints", local_dir=…)`. Keep the registry's lazy-load pattern intact. Mirror directory layout exactly so existing scan logic still finds files.
 - `src/backend/api/apps.py` — *remove* the eager checkpoint download from `AppConfig.ready()`. Downloads happen in `entrypoint.sh` *before* gunicorn starts, so workers never block.
 - `src/backend/research_api/settings.py` — add WhiteNoise to `MIDDLEWARE` (right after `SecurityMiddleware`); set `STATIC_ROOT = BASE_DIR / "staticfiles"`, `STATICFILES_STORAGE = "whitenoise.storage.CompressedManifestStaticFilesStorage"`; environment-drive `ALLOWED_HOSTS` and `CORS_ALLOWED_ORIGINS` (replace hardcoded `bani57.pythonanywhere.com` with `bani57-website.hf.space`); read `CHECKPOINTS_DIR` from env (default `/tmp/checkpoints` in container, repo path in dev).
 - `src/backend/research_api/urls.py` — add SPA shell view: `re_path(r"^(?!api/|static/).*$", spa_index_view)` returning `dist/index.html` for any non-API, non-static path. **This does not replace the Lara Croft 404** — Vue Router's existing `/:pathMatch(.*)*` route still catches unknown paths and renders `NotFoundView.vue` (Lara) client-side. Django's job ends at handing the SPA shell to the browser.
 #!/bin/sh
 set -e
 python -c "from huggingface_hub import snapshot_download; \
+  snapshot_download(repo_id='Bani57/checkpoints', \
                     local_dir='/tmp/checkpoints', \
                     local_dir_use_symlinks=False, \
                     max_workers=4)"
 **3. Create the repo (one-time, public).**
 ```bash
 huggingface-cli repo create checkpoints --type model --yes
+# Repo URL: https://huggingface.co/Bani57/checkpoints
 ```
 **4. Upload, preserving directory layout.** The remote layout matches the on-disk one exactly so the Django registry's existing scan logic finds files unchanged.
 ```bash
 # COINs graph completion (~2.4 GB)
+huggingface-cli upload Bani57/checkpoints \
   src/research/COINs-KGGeneration/graph_completion/checkpoints \
   COINs-KGGeneration/graph_completion/checkpoints \
   --repo-type model
 # COINs graph generation / DIGRESS (~2.6 GB)
+huggingface-cli upload Bani57/checkpoints \
   src/research/COINs-KGGeneration/graph_generation/checkpoints \
   COINs-KGGeneration/graph_generation/checkpoints \
   --repo-type model
 # MultiProxAn (~364 MB)
+huggingface-cli upload Bani57/checkpoints \
   src/research/MultiProxAn/checkpoints \
   MultiProxAn/checkpoints \
   --repo-type model
 `huggingface-cli upload` chunks files and is resumable on retry. Total upload at typical home upstream (~10 MB/s) ≈ 15–20 min.
+**5. Verify.** Open <https://huggingface.co/Bani57/checkpoints/tree/main>, confirm the three top-level folders. Then locally:
 ```bash
 python -c "from huggingface_hub import snapshot_download; \
+  snapshot_download(repo_id='Bani57/checkpoints', local_dir='/tmp/ck-test')"
 ls -lah /tmp/ck-test
 ```
 Expect ~5.4 GB total. This is the same call the container makes at boot.
 1. **Upload checkpoints to HF Hub** — as above.
 2. **Land the code changes** above on `master` (one PR is fine — single deployment unit).
 3. **Local smoke test**: `docker compose up --build`, hit `http://localhost:7860`, exercise each demo (KG completion, KG anomaly, MultiProxAn graph generation), verify SSE streaming and the Postman collection in `docs/postman/`. Confirm the Lara Croft page renders at `http://localhost:7860/this/is/a/wrong/path`.
+4. **Create the Space**: on huggingface.co/new-space, name `Bani57/website`, SDK = Docker, hardware = CPU basic (free), visibility = public.
 5. **Push to the Space**:
    ```bash
+   git remote add hf https://huggingface.co/spaces/Bani57/website
    git push hf master:main
    ```
    HF Spaces uses git on a `main` branch. First build takes ~20–30 min (mamba env solve + GPU torch + apt deps + frontend build + first checkpoint download).
    - `DJANGO_SECRET_KEY` — generate fresh (`python -c "import secrets; print(secrets.token_urlsafe(50))"`).
    - `DJANGO_DEBUG=False`.
    - `DJANGO_ALLOWED_HOSTS=bani57-website.hf.space`.
+   - `HF_TOKEN` — only if `Bani57/checkpoints` is private; for a public model repo no token is needed at runtime.
 7. **Verification on the live Space**:
    - Open `https://bani57-website.hf.space` — landing page renders, navigation works, hard-refresh on a deep route still resolves.
    - Visit `https://bani57-website.hf.space/foo/bar/quux` — Lara 404 page appears (confirms catch-all + Vue Router still cooperate in prod).

CLAUDE.md CHANGED Viewed

@@ -21,7 +21,7 @@ allowed (HTTPS and other security).
 ## Architecture
-- `https://bani57-website.hf.space`: Live website URL (HF Space `bani57/website`, Docker SDK)
 - `docs/cvAndrejJanchevski.pdf`: My CV, perfectly mapped to a subpage `/cv`, titled "CV".
 - `docs/janchevski_scalable_2025.pdf`: My PhD thesis, not served on website (only the
   link https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c given), but used as
@@ -31,12 +31,12 @@ allowed (HTTPS and other security).
 - `src/research/COINs-KGGeneration`: repo dump for the COINs KG reasoning and KG generation experiments (3.1 and 4.4 of
   PhD thesis)
 - `src/research/MultiProxAn`: repo dump for the MultiProxAn graph generation experiments (4.3 of PhD thesis)
-- `https://huggingface.co/bani57/checkpoints`: HF Hub model repo holding all PyTorch checkpoints. The container's
   entrypoint pre-warms a `snapshot_download` into `/app/checkpoints` (mirrors the on-disk `src/research/...`
   layout) before gunicorn starts, and `ModelRegistry._download_checkpoints` is idempotent on warm starts.
 - `Dockerfile`, `environment.yml`, `entrypoint.sh`, `docker-compose.yml`, `.dockerignore`: deployment assets at
   the repo root. `docker compose up --build` reproduces the production container locally on `:7860`.
-- `scripts/upload_checkpoints.py`: one-shot helper to (re-)publish local checkpoints to `bani57/checkpoints`.
 - `src/backend`: Django backend and endpoint files location
 - `src/frontend`: Vue.js and Semantic UI frontend files location

 ## Architecture
+- `https://bani57-website.hf.space`: Live website URL (HF Space `Bani57/website`, Docker SDK)
 - `docs/cvAndrejJanchevski.pdf`: My CV, perfectly mapped to a subpage `/cv`, titled "CV".
 - `docs/janchevski_scalable_2025.pdf`: My PhD thesis, not served on website (only the
   link https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c given), but used as
 - `src/research/COINs-KGGeneration`: repo dump for the COINs KG reasoning and KG generation experiments (3.1 and 4.4 of
   PhD thesis)
 - `src/research/MultiProxAn`: repo dump for the MultiProxAn graph generation experiments (4.3 of PhD thesis)
+- `https://huggingface.co/Bani57/checkpoints`: HF Hub model repo holding all PyTorch checkpoints. The container's
   entrypoint pre-warms a `snapshot_download` into `/app/checkpoints` (mirrors the on-disk `src/research/...`
   layout) before gunicorn starts, and `ModelRegistry._download_checkpoints` is idempotent on warm starts.
 - `Dockerfile`, `environment.yml`, `entrypoint.sh`, `docker-compose.yml`, `.dockerignore`: deployment assets at
   the repo root. `docker compose up --build` reproduces the production container locally on `:7860`.
+- `scripts/upload_checkpoints.py`: one-shot helper to (re-)publish local checkpoints to `Bani57/checkpoints`.
 - `src/backend`: Django backend and endpoint files location
 - `src/frontend`: Vue.js and Semantic UI frontend files location

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ license: mit
 Live demos for the PhD thesis _Scalable Methods for Knowledge Graph Reasoning and Generation_ (Andrej Janchevski, EPFL, 2025). The thesis is mirrored at [infoscience.epfl.ch](https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c); a CV is rendered at `/cv` from `docs/cvAndrejJanchevski.pdf`.
-The site is a single Django + Vue Docker container. The Django backend serves a stateless REST API at `/api/v1/*` and WhiteNoise serves the built Vue SPA on every other path — same origin, no CORS in production. Deployment target: Hugging Face [Space](docs/glossary.md#hf-space) `bani57/website` → <https://bani57-website.hf.space>.
 ## What's in the demos

 Live demos for the PhD thesis _Scalable Methods for Knowledge Graph Reasoning and Generation_ (Andrej Janchevski, EPFL, 2025). The thesis is mirrored at [infoscience.epfl.ch](https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c); a CV is rendered at `/cv` from `docs/cvAndrejJanchevski.pdf`.
+The site is a single Django + Vue Docker container. The Django backend serves a stateless REST API at `/api/v1/*` and WhiteNoise serves the built Vue SPA on every other path — same origin, no CORS in production. Deployment target: Hugging Face [Space](docs/glossary.md#hf-space) `Bani57/website` → <https://bani57-website.hf.space>.
 ## What's in the demos

docs/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Documentation
-Technical documentation for the PhD-research demo website. The site is a single Django + Vue Docker container deployed to a Hugging Face [Space](glossary.md#hf-space) (`bani57/website` → <https://bani57-website.hf.space>) and showcases three research methods from the thesis _Scalable Methods for Knowledge Graph Reasoning and Generation_ (Andrej Janchevski, EPFL, 2025): [COINs](glossary.md#coins), [MultiProxAn](glossary.md#multiproxan) and [KG anomaly correction](glossary.md#task-kg-anomaly).
 This document is a routing layer. Pick the document type that matches your goal — definitions, understanding, lookup, or task.

 # Documentation
+Technical documentation for the PhD-research demo website. The site is a single Django + Vue Docker container deployed to a Hugging Face [Space](glossary.md#hf-space) (`Bani57/website` → <https://bani57-website.hf.space>) and showcases three research methods from the thesis _Scalable Methods for Knowledge Graph Reasoning and Generation_ (Andrej Janchevski, EPFL, 2025): [COINs](glossary.md#coins), [MultiProxAn](glossary.md#multiproxan) and [KG anomaly correction](glossary.md#task-kg-anomaly).
 This document is a routing layer. Pick the document type that matches your goal — definitions, understanding, lookup, or task.

docs/explanation/architecture.md CHANGED Viewed

@@ -9,15 +9,14 @@ The site runs as a single Docker container on a Hugging Face [Space](../glossary
 flowchart LR
     Browser([Browser])
     subgraph Container["HF Space container :7860"]
-        Gunicorn[gunicorn<br/>2 workers]
         WN[WhiteNoise<br/>middleware]
         DRF[DRF views<br/>api/views/]
         SPA[(SPA dist/<br/>index.html and assets)]
         Stat[(staticfiles/<br/>collectstatic)]
-        CK[(/app/checkpoints<br/>PyTorch weights)]
-        Research[(/app/research<br/>research code)]
     end
-    Hub[(HF Hub<br/>bani57/checkpoints)]
     Browser -->|HTTPS| Gunicorn
     Gunicorn --> WN
@@ -26,9 +25,8 @@ flowchart LR
     WN -->|miss| URLs{Django<br/>URL router}
     URLs -->|/api/v1/*| DRF
     URLs -->|non-API path| SPA
-    DRF -.->|imports| Research
-    DRF -.->|loads weights| CK
-    CK -.->|cold-start populate| Hub
 ```
 Routing precedence inside the container:
@@ -53,7 +51,13 @@ Long-running inference is exposed as [SSE](../glossary.md#sse-server-sent-events
 ## Concurrency
-Free-tier HF Spaces run on 2 vCPU with no GPU. The container starts gunicorn with two workers and two threads, and `ModelRegistry._inference_lock` serializes inference globally. A second concurrent inference request gets HTTP 429 with `INFERENCE_BUSY`. Discovery endpoints (datasets, entities, methods, health) acquire no lock and run freely in parallel.
 ## Container lifecycle
@@ -65,9 +69,9 @@ sequenceDiagram
     participant G as gunicorn
     participant D as Django (AppConfig.ready)
     HF->>E: container start
-    E->>Hub: snapshot_download(bani57/checkpoints)
-    Hub-->>E: 5.4 GB into /app/checkpoints
-    E->>G: exec gunicorn :7860
     G->>D: import research_api.wsgi
     D->>D: ModelRegistry.initialize()
     D->>D: scan checkpoint dirs
@@ -84,8 +88,8 @@ Detail on each step lives in [inference-lifecycle.md](inference-lifecycle.md).
 .
 ├── Dockerfile               # multi-stage: Node SPA build → micromamba runtime
 ├── environment.yml          # conda half (rdkit 2023.03.2, boost, cairo, …)
-├── docker-compose.yml       # local-dev parity, named volume for checkpoints
-├── entrypoint.sh            # snapshot_download then exec gunicorn
 ├── README.md                # HF Space landing card (YAML front-matter)
 ├── src/
 │   ├── backend/
@@ -96,7 +100,7 @@ Detail on each step lives in [inference-lifecycle.md](inference-lifecycle.md).
 │   ├── frontend/            # Vue 3 SPA (built into dist/ at image time)
 │   └── research/            # research code, imported via sys.path tweaks
 └── scripts/
-    └── upload_checkpoints.py  # one-shot publisher to bani57/checkpoints
 ```
 ## Cross-references

 flowchart LR
     Browser([Browser])
     subgraph Container["HF Space container :7860"]
+        Gunicorn[gunicorn master<br/>1 worker, 4 threads<br/>--preload]
         WN[WhiteNoise<br/>middleware]
         DRF[DRF views<br/>api/views/]
         SPA[(SPA dist/<br/>index.html and assets)]
         Stat[(staticfiles/<br/>collectstatic)]
+        Research[(/app/research<br/>code, configs,<br/>Loader caches,<br/>HF-Hub weights)]
     end
+    Hub[(HF Hub<br/>Bani57/checkpoints)]
     Browser -->|HTTPS| Gunicorn
     Gunicorn --> WN
     WN -->|miss| URLs{Django<br/>URL router}
     URLs -->|/api/v1/*| DRF
     URLs -->|non-API path| SPA
+    DRF -.->|imports + loads| Research
+    Research -.->|cold-start populate| Hub
 ```
 Routing precedence inside the container:
 ## Concurrency
+Free-tier HF Spaces run on 2 vCPU with no GPU. The container starts gunicorn with **one preloaded worker** and four threads. The single-worker choice is deliberate:
+- The `ModelRegistry` holds multi-GB of COINs Loaders + lazily-loaded model weights in process RAM. A second worker would duplicate every byte.
+- `ModelRegistry._inference_lock` serializes inference globally. A second worker would only ever queue on the same lock — no throughput gain.
+- `--preload` runs Django setup (and `ModelRegistry.initialize`) once in the master before forking, so the worker inherits a ready state via copy-on-write and gunicorn's silent-time timeout doesn't fire mid-init.
+A second concurrent inference request gets HTTP 429 with `INFERENCE_BUSY`. Discovery endpoints (datasets, entities, methods, health) acquire no lock and run freely in parallel via the worker's threads.
 ## Container lifecycle
     participant G as gunicorn
     participant D as Django (AppConfig.ready)
     HF->>E: container start
+    E->>Hub: snapshot_download(Bani57/checkpoints)
+    Hub-->>E: 5.4 GB into /app/research
+    E->>G: exec gunicorn --preload :7860
     G->>D: import research_api.wsgi
     D->>D: ModelRegistry.initialize()
     D->>D: scan checkpoint dirs
 .
 ├── Dockerfile               # multi-stage: Node SPA build → micromamba runtime
 ├── environment.yml          # conda half (rdkit 2023.03.2, boost, cairo, …)
+├── docker-compose.yml       # local-dev parity (no volume — checkpoints re-pull on each up)
+├── entrypoint.sh            # snapshot_download then exec gunicorn --preload
 ├── README.md                # HF Space landing card (YAML front-matter)
 ├── src/
 │   ├── backend/
 │   ├── frontend/            # Vue 3 SPA (built into dist/ at image time)
 │   └── research/            # research code, imported via sys.path tweaks
 └── scripts/
+    └── upload_checkpoints.py  # one-shot publisher to Bani57/checkpoints
 ```
 ## Cross-references

docs/explanation/inference-lifecycle.md CHANGED Viewed

@@ -4,22 +4,26 @@ How models get from disk to RAM to a response. The site optimizes for slow cold
 ## Boot sequence
-`api/apps.ApiConfig.ready()` runs once when Django imports the app — before gunicorn accepts traffic. It calls `ModelRegistry.initialize()`, which executes four steps in order:
 ```mermaid
 flowchart TB
-    Start([AppConfig.ready]) --> Skip{outer auto-reloader?}
-    Skip -->|yes| Done([return])
     Skip -->|no| Init[ModelRegistry.initialize]
     Init --> DL[_download_checkpoints<br/>HF Hub snapshot_download]
     DL --> Scan[_scan_checkpoints<br/>list available files]
     Scan --> Loaders[_load_all_loaders<br/>3 lightweight Loaders]
     Loaders --> Sub[_generate_sample_subgraphs<br/>per-dataset DFS partitions]
-    Sub --> Ready([gunicorn accepts traffic])
 ```
 ### 1. Download checkpoints (idempotent)
-`_download_checkpoints` checks every expected subdir under `CHECKPOINTS_ROOT`; if any is empty it calls `huggingface_hub.snapshot_download(repo_id="bani57/checkpoints")` to pull the missing files. The repo's directory layout mirrors the on-disk one, so files land in their final location and the scan code doesn't need a redistribution step.
 In production the `entrypoint.sh` script also calls `snapshot_download` *before* gunicorn starts. That makes Django's call a no-op on a normal cold start; it only does real work when run outside the container or when the entrypoint was bypassed.
@@ -52,6 +56,13 @@ First request for a `(dataset, algorithm)` or `(dataset, model_type)` combinatio
 The COINs registry also reuses Loaders across algorithms via `_coins_loaders[(dataset_id, seed, leiden_resolution)]`: all four `transe / distmult / complex / rotate` checkpoints on a dataset share the same seed and Leiden resolution, so they share one Loader and don't reload the graph four times.
 ## Concurrency and the inference lock
 `ModelRegistry._inference_lock` is a single `threading.Lock`. Every endpoint that runs PyTorch inference acquires it non-blocking; if it can't, it raises `InferenceBusy` (HTTP 429). The lock is released in a `finally` after the response is fully streamed:
@@ -75,6 +86,23 @@ The free-tier HF Space gives 16 GB RAM. Approximate usage:
 Worst case (everything ever requested in one Space lifetime): well under 16 GB. The current code does not evict — caches grow monotonically until the container restarts.
 ## Cross-references
 - [explanation/architecture.md](architecture.md) — where this lifecycle sits in the request flow.

 ## Boot sequence
+`api/apps.ApiConfig.ready()` runs once when Django imports the app — typically inside the gunicorn master under `--preload`, before any worker forks. It calls `ModelRegistry.initialize()`, which executes four steps in order:
 ```mermaid
 flowchart TB
+    Start([AppConfig.ready]) --> Mgmt{argv in<br/>SKIP_REGISTRY_INIT?}
+    Mgmt -->|yes: collectstatic,<br/>migrate, check, …| Done([return])
+    Mgmt -->|no| Skip{outer auto-reloader?}
+    Skip -->|yes| Done
     Skip -->|no| Init[ModelRegistry.initialize]
     Init --> DL[_download_checkpoints<br/>HF Hub snapshot_download]
     DL --> Scan[_scan_checkpoints<br/>list available files]
     Scan --> Loaders[_load_all_loaders<br/>3 lightweight Loaders]
     Loaders --> Sub[_generate_sample_subgraphs<br/>per-dataset DFS partitions]
+    Sub --> Ready([gunicorn worker forks via copy-on-write])
 ```
+The `SKIP_REGISTRY_INIT` set covers `collectstatic`, `migrate`, `makemigrations`, `check`, `shell`, `test`, etc. Without that guard, `python manage.py collectstatic --noinput` at image build time would trigger the full ~6 GB checkpoint download into a throwaway layer.
 ### 1. Download checkpoints (idempotent)
+`_download_checkpoints` checks every expected subdir under `CHECKPOINTS_ROOT`; if any is empty it calls `huggingface_hub.snapshot_download(repo_id="Bani57/checkpoints")` to pull the missing files. The repo's directory layout mirrors the on-disk one, so files land in their final location and the scan code doesn't need a redistribution step.
 In production the `entrypoint.sh` script also calls `snapshot_download` *before* gunicorn starts. That makes Django's call a no-op on a normal cold start; it only does real work when run outside the container or when the entrypoint was bypassed.
 The COINs registry also reuses Loaders across algorithms via `_coins_loaders[(dataset_id, seed, leiden_resolution)]`: all four `transe / distmult / complex / rotate` checkpoints on a dataset share the same seed and Leiden resolution, so they share one Loader and don't reload the graph four times.
+### Monkey-patches around `experiment.prepare()`
+Two patches wrap each call to `experiment.prepare()` in `_load_coins_experiment`, restored in a `finally`:
+- **`Module.share_memory` → no-op.** The research code's `prepare()` calls `embedder.share_memory()` to share weights across multi-process training workers. Inference is single-process; the call is gratuitous, and on Linux containers with a small `/dev/shm` (Docker default 64 MB, free HF Spaces tmpfs similar) it raises a `Bus error` mid-prepare. The no-op makes `prepare()` return cleanly.
+- **`torch.load` → TransE-init dim expansion.** `prepare()` loads `transe_model.tar` to seed the embedder's `entity_embeddings_initial` buffers. The KBGAT embedder's `__init__` then assigns `weight.data = init`, which silently re-shapes the YAML-declared embedding layer to the init's shape. For wordnet KBGAT this is fatal: the trained checkpoint was 200d but the wordnet TransE init is 100d, so the embedder ends up at 100d and the trained `load_state_dict` blows up on the dim mismatch. The patch detects TransE state dicts being loaded and repeats them along the embedding axis (e.g. 100d → 200d via `cat([init, init])`) when the YAML's `embedding_dim` is an integer multiple of the init's dim — same trick `_adapt_kbgat_state_dict` already uses for the GATConv multi-head expansion.
 ## Concurrency and the inference lock
 `ModelRegistry._inference_lock` is a single `threading.Lock`. Every endpoint that runs PyTorch inference acquires it non-blocking; if it can't, it raises `InferenceBusy` (HTTP 429). The lock is released in a `finally` after the response is fully streamed:
 Worst case (everything ever requested in one Space lifetime): well under 16 GB. The current code does not evict — caches grow monotonically until the container restarts.
+### Local-dev memory floor
+Loading all three Loaders + computing graph metrics for NELL peaks at ~5–6 GB transient RAM unless the bundled Loader caches (`results/<dataset>/*.npz` and `*.gz`, ~10 MB total in the image) are present, which let the boot read precomputed arrays instead of recomputing them. WSL2 should be configured for at least 12 GB to give Docker enough headroom; the recommended `.wslconfig` is:
+```ini
+[wsl2]
+memory=12GB
+processors=4
+swap=4GB
+```
+`docker-compose.yml` also sets `shm_size: "2gb"` to avoid `Bus error` from PyTorch's shared-memory paths under Docker's 64 MB `/dev/shm` default.
+## MultiProx symmetry safeguard
+`graphgen_inference._collapse_final` symmetrises the edge tensor before calling `model.sample_discrete_graph_given_z0`. The model has a strict `assert (pred_E == pred_E.T).all()`; the MultiProx Gibbs aggregation (mean / median over multiple chains) can introduce ULP-level asymmetry that survives into `pred_E` and trips the assert on some BLAS / vectorization stacks (notably the Linux `+cu118` torch wheel inside the deployment container, while the same code runs fine on the Windows wheel in dev). `E = (E + E.T) / 2` is a no-op on already-symmetric input and a one-line invariant fix when it isn't.
 ## Cross-references
 - [explanation/architecture.md](architecture.md) — where this lifecycle sits in the request flow.

docs/glossary.md CHANGED Viewed

@@ -64,10 +64,10 @@ Boot-time: pre-warm checkpoints from HF Hub, scan checkpoint dirs, load lightwei
 ## Deployment
 ### HF Space
-A Hugging Face Spaces application running this repo's `Dockerfile`. The deployed URL is `https://bani57-website.hf.space`. The Space repo is `bani57/website`.
 ### HF Hub model repo
-`bani57/checkpoints` — holds all PyTorch weights. Mirrors the on-disk layout under `CHECKPOINTS_ROOT` so `huggingface_hub.snapshot_download` populates files in their expected paths and the registry's scan logic finds them unchanged.
 ### Persistent storage (HF Spaces)
 A paid `/data` volume that survives Space restarts. Free Spaces have 50 GB ephemeral disk that resets on restart. Without persistent storage, every cold start re-downloads checkpoints from HF Hub.

 ## Deployment
 ### HF Space
+A Hugging Face Spaces application running this repo's `Dockerfile`. The deployed URL is `https://bani57-website.hf.space`. The Space repo is `Bani57/website`.
 ### HF Hub model repo
+`Bani57/checkpoints` — holds all PyTorch weights. Mirrors the on-disk layout under `CHECKPOINTS_ROOT` so `huggingface_hub.snapshot_download` populates files in their expected paths and the registry's scan logic finds them unchanged.
 ### Persistent storage (HF Spaces)
 A paid `/data` volume that survives Space restarts. Free Spaces have 50 GB ephemeral disk that resets on restart. Without persistent storage, every cold start re-downloads checkpoints from HF Hub.

docs/guides/deploy.md CHANGED Viewed

@@ -1,12 +1,12 @@
 # How to deploy to Hugging Face Spaces
-Push a new version of the site to the HF Space `bani57/website`. Everything is git-based — there is no build button. The Space rebuilds its Docker image from whatever is on the `main` branch of its repo. For the rationale and full design see [`plans/deploy_huggingface_spaces.md`](../../plans/deploy_huggingface_spaces.md).
 ## Prerequisites
-- The Space `bani57/website` exists with **SDK = Docker**, **Hardware = CPU basic (free)**.
 - Space secrets are set: `DJANGO_SECRET_KEY`, `DJANGO_DEBUG=False`, `DJANGO_ALLOWED_HOSTS=bani57-website.hf.space`.
-- The HF Hub model repo `bani57/checkpoints` exists and contains the current weights. See [Refreshing checkpoints](#refreshing-checkpoints) below.
 - Local working tree is on the deployment branch (typically `master`) and tests pass.
 ## 1. Local container smoke test
@@ -36,7 +36,7 @@ If anything fails, fix it locally — never push a broken image to the Space.
 Add the HF git remote (one-time):
 ```bash
-git remote add hf https://huggingface.co/spaces/bani57/website
 ```
 Then for each release:
@@ -75,7 +75,7 @@ Force-push triggers an image rebuild from the rolled-back commit. Checkpoints in
 ## Refreshing checkpoints
-Checkpoints live in `bani57/checkpoints` on HF Hub, separate from the code repo. To publish new weights:
 ```bash
 huggingface-cli login    # one-time, paste a write token
@@ -92,7 +92,7 @@ In the Space Settings → Variables and secrets:
 - **Secrets** (encrypted, not exposed in logs):
   - `DJANGO_SECRET_KEY` — required. Generate with `python -c "import secrets; print(secrets.token_urlsafe(50))"`.
-  - `HF_TOKEN` — only if `bani57/checkpoints` is private.
 - **Variables** (visible in logs and to viewers of the Space metadata):
   - `DJANGO_DEBUG=False`.
   - `DJANGO_ALLOWED_HOSTS=bani57-website.hf.space`.
@@ -103,7 +103,7 @@ Changing a variable or secret triggers an automatic Space restart.
 ## Optional: persistent storage
-Free Spaces have 50 GB ephemeral disk that resets on restart, so every cold start re-downloads ~5.4 GB of checkpoints. The first request after a restart waits for `snapshot_download` to finish. For ~$5/month, the Space's **Persistent Storage** tier puts `/data` on a permanent volume; mount the checkpoints there (set `CHECKPOINTS_ROOT=/data/checkpoints`) and cold starts become instant. Worth doing only when traffic justifies it.
 ## See also

 # How to deploy to Hugging Face Spaces
+Push a new version of the site to the HF Space `Bani57/website`. Everything is git-based — there is no build button. The Space rebuilds its Docker image from whatever is on the `main` branch of its repo. For the rationale and full design see [`plans/deploy_huggingface_spaces.md`](../../plans/deploy_huggingface_spaces.md).
 ## Prerequisites
+- The Space `Bani57/website` exists with **SDK = Docker**, **Hardware = CPU basic (free)**.
 - Space secrets are set: `DJANGO_SECRET_KEY`, `DJANGO_DEBUG=False`, `DJANGO_ALLOWED_HOSTS=bani57-website.hf.space`.
+- The HF Hub model repo `Bani57/checkpoints` exists and contains the current weights. See [Refreshing checkpoints](#refreshing-checkpoints) below.
 - Local working tree is on the deployment branch (typically `master`) and tests pass.
 ## 1. Local container smoke test
 Add the HF git remote (one-time):
 ```bash
+git remote add hf https://huggingface.co/spaces/Bani57/website
 ```
 Then for each release:
 ## Refreshing checkpoints
+Checkpoints live in `Bani57/checkpoints` on HF Hub, separate from the code repo. To publish new weights:
 ```bash
 huggingface-cli login    # one-time, paste a write token
 - **Secrets** (encrypted, not exposed in logs):
   - `DJANGO_SECRET_KEY` — required. Generate with `python -c "import secrets; print(secrets.token_urlsafe(50))"`.
+  - `HF_TOKEN` — strongly recommended. A read-scope token (huggingface.co/settings/tokens → New token → Read) lifts anonymous rate limits and roughly triples checkpoint download throughput on cold starts. Required only if the checkpoint repo is private.
 - **Variables** (visible in logs and to viewers of the Space metadata):
   - `DJANGO_DEBUG=False`.
   - `DJANGO_ALLOWED_HOSTS=bani57-website.hf.space`.
 ## Optional: persistent storage
+Free Spaces have 50 GB ephemeral disk that resets on restart, so every cold start re-downloads ~5.4 GB of checkpoints. The first request after a restart waits for `snapshot_download` to finish. For ~$5/month, the Space's **Persistent Storage** tier puts `/data` on a permanent volume. To use it, set the variable `CHECKPOINTS_ROOT=/data/checkpoints` in the Space settings. The container's `entrypoint.sh` will write the snapshot there; subsequent restarts find it already populated. The default unifies `CHECKPOINTS_ROOT` with `RESEARCH_ROOT=/app/research`, so checkpoints land alongside the bundled research code on free tier — clean for one-shot deploys, costly to re-pull on every restart.
 ## See also

docs/guides/local-development.md CHANGED Viewed

@@ -8,6 +8,14 @@ End-to-end development setup: backend Python env, frontend dev server, checkpoin
 - Node 20+ and npm.
 - ~6 GB of free disk for the checkpoints if you intend to run inference locally.
 - Optional: a Hugging Face account if you want to publish or update checkpoints.
 ## 1. Create the Python environment
@@ -23,12 +31,12 @@ This mirrors what the deployment image installs. The CUDA 11.8 wheels work on CP
 ## 2. Pull checkpoints
-If you have a Hugging Face account and the `bani57/checkpoints` repo is accessible to you:
 ```bash
 huggingface-cli login    # paste a read token
 python -c "from huggingface_hub import snapshot_download; \
-  snapshot_download(repo_id='bani57/checkpoints', \
                     local_dir='src/research', local_dir_use_symlinks=False)"
 ```

 - Node 20+ and npm.
 - ~6 GB of free disk for the checkpoints if you intend to run inference locally.
 - Optional: a Hugging Face account if you want to publish or update checkpoints.
+- For Docker (`docker compose up`): WSL2 with at least **12 GB** of memory. Bumping `/dev/shm` is handled by `docker-compose.yml` (`shm_size: "2gb"`). Edit `%UserProfile%\.wslconfig`:
+  ```ini
+  [wsl2]
+  memory=12GB
+  processors=4
+  swap=4GB
+  ```
+  then `wsl --shutdown` and restart Docker Desktop. The peak boot RAM is ~5–6 GB while the three COINs Loaders compute graph metrics.
 ## 1. Create the Python environment
 ## 2. Pull checkpoints
+If you have a Hugging Face account and the `Bani57/checkpoints` repo is accessible to you:
 ```bash
 huggingface-cli login    # paste a read token
 python -c "from huggingface_hub import snapshot_download; \
+  snapshot_download(repo_id='Bani57/checkpoints', \
                     local_dir='src/research', local_dir_use_symlinks=False)"
 ```

docs/reference/backend-services.md CHANGED Viewed

@@ -13,7 +13,10 @@ Module-by-module reference for `src/backend/api/`. The Django app is named `api`
 ## `api/` — Django app
 ### `apps.py`
-`ApiConfig.ready()` runs once at boot. Skips initialization in the outer `runserver` reloader process (avoids double-loading models in dev) and calls `ModelRegistry.initialize()` in the inner one.
 ### `urls.py`
 Maps every endpoint listed in [reference/api.md](api.md) to the matching view class.
@@ -103,6 +106,11 @@ Checkpoint loading helpers live in the same module:
 - `_adapt_shape_mismatches`, `_adapt_mlp_bn_keys`, `_adapt_kbgat_state_dict` — torch-geometric 2.0.x → 2.3.x weight-format compatibility shims.
 - `_free_heavy_arrays` — discards memory-intensive Loader fields after init.
 ### `coins_inference.py`
 `coins_predict_inner(experiment, dataset_id, algorithm, query_structure_id, anchors, variables, relations_map, top_k)` — runs a single COINs prediction. Validates the query, builds the embedding query, scores candidate tails, returns the top-k with cleaned names and the community-rank info.
@@ -113,6 +121,7 @@ The MultiProxAn / DiGress sampling loop.
 - `run_multiprox_init(model, num_nodes, n, m, t, t_prime, gibbs_chain_freq, dataset_id)` — initial denoise to step `t_prime`. Returns the partial state for a `/continue` follow-up.
 - `run_multiprox_step(model, state, dataset_id)` — one Gibbs round.
 - `encode_state_blob` / `decode_state_blob` — base64 round-trip for the [continuation token](../glossary.md#continuation-token--state-blob).
 ### `kg_anomaly_inference.py`
 The KG-subgraph correction loop. Mirrors `graphgen_inference.py` but operates on knowledge-graph subgraphs and computes the KG log-likelihood metric per frame using the frozen COINs link ranker.

 ## `api/` — Django app
 ### `apps.py`
+`ApiConfig.ready()` runs once at boot. Two skip-checks before calling `ModelRegistry.initialize()`:
+- `sys.argv[1]` against `_SKIP_REGISTRY_INIT` (`collectstatic`, `migrate`, `makemigrations`, `check`, `shell`, `showmigrations`, `diffsettings`, `test`, `compilemessages`, `makemessages`). Stops `python manage.py collectstatic --noinput` from triggering a multi-GB checkpoint download into a throwaway image layer.
+- The outer `runserver` reloader process (`RUN_MAIN != "true"`). Stops dev mode from doing the heavy boot twice.
 ### `urls.py`
 Maps every endpoint listed in [reference/api.md](api.md) to the matching view class.
 - `_adapt_shape_mismatches`, `_adapt_mlp_bn_keys`, `_adapt_kbgat_state_dict` — torch-geometric 2.0.x → 2.3.x weight-format compatibility shims.
 - `_free_heavy_arrays` — discards memory-intensive Loader fields after init.
+`_load_coins_experiment` wraps each `experiment.prepare()` call in two monkey-patches (restored in a `finally`) — see [explanation/inference-lifecycle.md](../explanation/inference-lifecycle.md#monkey-patches-around-experimentprepare) for the rationale:
+- `Module.share_memory` → no-op (avoids `Bus error` from PyTorch shared-memory paths under tight `/dev/shm`).
+- `torch.load` → TransE-init dim expansion (repeats `transe_model.tar` weights along the embedding axis when YAML's `embedding_dim` is an integer multiple of the init's dim, so KBGAT's `weight.data = init` doesn't clobber the model's declared dim).
 ### `coins_inference.py`
 `coins_predict_inner(experiment, dataset_id, algorithm, query_structure_id, anchors, variables, relations_map, top_k)` — runs a single COINs prediction. Validates the query, builds the embedding query, scores candidate tails, returns the top-k with cleaned names and the community-rank info.
 - `run_multiprox_init(model, num_nodes, n, m, t, t_prime, gibbs_chain_freq, dataset_id)` — initial denoise to step `t_prime`. Returns the partial state for a `/continue` follow-up.
 - `run_multiprox_step(model, state, dataset_id)` — one Gibbs round.
 - `encode_state_blob` / `decode_state_blob` — base64 round-trip for the [continuation token](../glossary.md#continuation-token--state-blob).
+- `_collapse_final` symmetrises `E` (`E = (E + E.T) / 2`) before calling `model.sample_discrete_graph_given_z0`. The model has a strict symmetry assert that's tripped by ULP-level drift from the MultiProx aggregation on some BLAS stacks. See the [MultiProx symmetry safeguard](../explanation/inference-lifecycle.md#multiprox-symmetry-safeguard) note.
 ### `kg_anomaly_inference.py`
 The KG-subgraph correction loop. Mirrors `graphgen_inference.py` but operates on knowledge-graph subgraphs and computes the KG log-likelihood metric per frame using the frozen COINs link ranker.

src/backend/README.md CHANGED Viewed

@@ -28,7 +28,7 @@ This README covers the practical surface: running the backend, where things live
    ```
 3. **Model checkpoints** — downloaded automatically from the Hugging Face Hub model repo
-   `bani57/checkpoints` on first boot. The remote layout mirrors the on-disk one, so
    `huggingface_hub.snapshot_download(local_dir=CHECKPOINTS_ROOT)` drops files directly
    into the expected paths:
    - `src/research/COINs-KGGeneration/graph_completion/checkpoints/` (COINs: `{dataset}_{algorithm}.tar`)
@@ -67,10 +67,11 @@ The API is served at `http://localhost:8000/api/v1/`.
 | `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
 | `CORS_ALLOWED_ORIGINS` | `https://bani57-website.hf.space` | Comma-separated allowed CORS origins. |
 | `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. |
-| `RESEARCH_ROOT` | `<repo>/src/research` | Where the research-code modules live. |
-| `CHECKPOINTS_ROOT` | Same as `RESEARCH_ROOT` | Where `huggingface_hub` deposits weights. In the container this is `/app/checkpoints` on a writable volume. |
-| `HF_CHECKPOINTS_REPO` | `bani57/checkpoints` | HF Hub model repo holding all weights. |
-| `HF_TOKEN` | unset | Only needed if the checkpoint repo is private. |
 | `SPA_DIST_DIR` | `<backend>/dist` | Folder containing `index.html` from `npm run build`. WhiteNoise serves assets from here. |
 ## Startup Sequence
@@ -79,7 +80,7 @@ In the deployment container the entrypoint script pre-warms the checkpoint downl
 from the Hugging Face Hub *before* gunicorn starts, so workers never block on the
 network. Then on Django boot (`ApiConfig.ready()`), the `ModelRegistry` initializes:
-1. **Verify / download checkpoints** from `bani57/checkpoints` on HF Hub if any expected
    subdir is missing. Idempotent — a no-op when the entrypoint already populated the tree
    or when running locally with weights on disk.
 2. **Scan checkpoint directories** to detect available models per method
@@ -91,7 +92,7 @@ All model weights (COINs inference, graph generation, KG anomaly) are loaded laz
 ## Deployment
 The site is packaged as a single Docker image and deployed to a Hugging Face Space
-(`bani57/website` -> <https://bani57-website.hf.space>). The image:
 - builds the Vue SPA with `npm run build` in a Node 20 stage,
 - assembles a `mambaorg/micromamba` runtime mirroring the local `website_c` env from
@@ -99,7 +100,7 @@ The site is packaged as a single Docker image and deployed to a Hugging Face Spa
 - copies the SPA `dist/` next to Django so WhiteNoise serves it on the same origin as
   `/api/v1/`,
 - runs `entrypoint.sh`, which `snapshot_download`s checkpoints from
-  `bani57/checkpoints` on HF Hub into `/app/checkpoints` and execs `gunicorn` on `0.0.0.0:7860`.
 Local reproduction:
 ```bash
@@ -109,7 +110,7 @@ docker compose up --build
 Push to the Space (one-time remote setup):
 ```bash
-git remote add hf https://huggingface.co/spaces/bani57/website
 git push hf master:main
 ```

    ```
 3. **Model checkpoints** — downloaded automatically from the Hugging Face Hub model repo
+   `Bani57/checkpoints` on first boot. The remote layout mirrors the on-disk one, so
    `huggingface_hub.snapshot_download(local_dir=CHECKPOINTS_ROOT)` drops files directly
    into the expected paths:
    - `src/research/COINs-KGGeneration/graph_completion/checkpoints/` (COINs: `{dataset}_{algorithm}.tar`)
 | `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
 | `CORS_ALLOWED_ORIGINS` | `https://bani57-website.hf.space` | Comma-separated allowed CORS origins. |
 | `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. |
+| `RESEARCH_ROOT` | `<repo>/src/research` (dev), `/app/research` (image) | Where the research-code modules live. |
+| `CHECKPOINTS_ROOT` | Same as `RESEARCH_ROOT` | Where `huggingface_hub` deposits weights. Override to e.g. `/data/checkpoints` on a paid HF Space with persistent storage. |
+| `HF_CHECKPOINTS_REPO` | `Bani57/checkpoints` | HF Hub model repo holding all weights. |
+| `HF_TOKEN` | unset | Recommended. Read-scope token lifts anonymous rate limits and roughly triples cold-start download throughput. Required if the repo is private. Empty values are unset by `entrypoint.sh` to avoid a malformed `Bearer ` header. |
+| `HF_HUB_ENABLE_HF_TRANSFER` | `1` (image), unset (dev) | Enables the Rust-accelerated `hf_transfer` backend for `snapshot_download`. |
 | `SPA_DIST_DIR` | `<backend>/dist` | Folder containing `index.html` from `npm run build`. WhiteNoise serves assets from here. |
 ## Startup Sequence
 from the Hugging Face Hub *before* gunicorn starts, so workers never block on the
 network. Then on Django boot (`ApiConfig.ready()`), the `ModelRegistry` initializes:
+1. **Verify / download checkpoints** from `Bani57/checkpoints` on HF Hub if any expected
    subdir is missing. Idempotent — a no-op when the entrypoint already populated the tree
    or when running locally with weights on disk.
 2. **Scan checkpoint directories** to detect available models per method
 ## Deployment
 The site is packaged as a single Docker image and deployed to a Hugging Face Space
+(`Bani57/website` -> <https://bani57-website.hf.space>). The image:
 - builds the Vue SPA with `npm run build` in a Node 20 stage,
 - assembles a `mambaorg/micromamba` runtime mirroring the local `website_c` env from
 - copies the SPA `dist/` next to Django so WhiteNoise serves it on the same origin as
   `/api/v1/`,
 - runs `entrypoint.sh`, which `snapshot_download`s checkpoints from
+  `Bani57/checkpoints` on HF Hub into `/app/checkpoints` and execs `gunicorn` on `0.0.0.0:7860`.
 Local reproduction:
 ```bash
 Push to the Space (one-time remote setup):
 ```bash
+git remote add hf https://huggingface.co/spaces/Bani57/website
 git push hf master:main
 ```