Andrej Janchevski commited on
Commit
cb5f524
·
1 Parent(s): a285266

feat(backend): rewire for hugging face spaces deployment

Browse files

- registry.py: replace gdown / Google Drive with
huggingface_hub.snapshot_download from bani57/checkpoints. The
remote layout mirrors the on-disk one under CHECKPOINTS_ROOT, so the
scan logic finds files unchanged and the old _distribute_checkpoints
step is gone.
- settings.py: env-drive ALLOWED_HOSTS, CORS_ALLOWED_ORIGINS,
RESEARCH_ROOT and CHECKPOINTS_ROOT so dev (local repo paths) and
prod (writable /app/checkpoints) share one settings file. Add
whitenoise + django.contrib.staticfiles for same-origin SPA
serving via SPA_DIST_DIR / WHITENOISE_ROOT, plus SecurityMiddleware,
CsrfViewMiddleware and XFrameOptionsMiddleware (deploy-check
hygiene). HSTS / secure cookies / CSRF_TRUSTED_ORIGINS gated on
DEBUG=False; SECURE_SSL_REDIRECT stays off because HF Spaces
terminates TLS upstream.
- urls.py: add a non-API catch-all view that serves dist/index.html
so Vue Router (and its Lara Croft 404) handles client-side routes.
- requirements.txt: drop gdown, add huggingface_hub,
whitenoise[brotli], gunicorn. Trim the conda-only comment to the
packages actually present in website_c (rdkit, boost).
- README.md: rewrite Prerequisites, the env-var table and the
Startup Sequence; add a Deployment section pointing at the
Dockerfile / docker compose flow.

src/backend/README.md CHANGED
@@ -4,24 +4,33 @@ Stateless REST API serving the PhD research models. No database — PyTorch chec
4
 
5
  ## Prerequisites
6
 
7
- 1. **Conda environment** with pre-installed system deps:
 
8
  ```bash
9
- conda create -n website python=3.9
10
- conda activate website
11
- conda install -c conda-forge rdkit=2023.03.2 graph-tool=2.45
12
  ```
13
 
14
- 2. **Pip dependencies**:
15
  ```bash
16
- pip install -r requirements.txt
17
  ```
18
 
19
- 3. **Model checkpoints** — downloaded automatically from Google Drive on first boot (via `gdown`). Alternatively, manually place files in:
 
 
 
20
  - `src/research/COINs-KGGeneration/graph_completion/checkpoints/` (COINs: `{dataset}_{algorithm}.tar`)
21
- - `src/research/COINs-KGGeneration/graph_completion/results/{dataset}/` (KBGAT TransE init: `transe_model.tar`; auto-distributed from `transe_model_{dataset}.tar` in the Drive's `checkpoints_coins` folder)
22
  - `src/research/COINs-KGGeneration/graph_generation/checkpoints/` (KG anomaly: `{dataset}.ckpt`, `{dataset}_correct.ckpt`)
23
  - `src/research/MultiProxAn/checkpoints/` (graph generation: `{dataset}.ckpt`, `{dataset}_c.ckpt`)
24
 
 
 
 
 
 
 
25
  4. **Dataset files** — the raw KG data files must be present under `src/research/COINs-KGGeneration/data/` (FB15k-237, WN18RR, NELL-995).
26
 
27
  ## Running
@@ -45,19 +54,54 @@ The API is served at `http://localhost:8000/api/v1/`.
45
  | `DJANGO_SECRET_KEY` | `dev-insecure-key-change-in-production` | Django secret key. **Set in production.** |
46
  | `DJANGO_DEBUG` | `True` | Enable debug mode. Set to `False` in production. |
47
  | `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
 
48
  | `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. |
 
 
 
 
 
49
 
50
  ## Startup Sequence
51
 
52
- On boot (`ApiConfig.ready()`), the `ModelRegistry` initializes:
 
 
53
 
54
- 1. **Download checkpoints** from Google Drive if not already present locally
 
 
55
  2. **Scan checkpoint directories** to detect available models per method
56
  3. **Load lightweight COINs Loaders** — one per dataset (freebase, wordnet, nell), loading graph data, name maps, and train/val/test splits. Heavy arrays (node neighbours ~275MB each, community neighbours, adjacency dicts) are freed after initialization to keep memory low.
57
  4. **Generate sample subgraphs** for KG anomaly using the COINs Loaders
58
 
59
  All model weights (COINs inference, graph generation, KG anomaly) are loaded lazily at first inference request.
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ## API Endpoints
62
 
63
  All endpoints are prefixed with `/api/v1/`.
 
4
 
5
  ## Prerequisites
6
 
7
+ 1. **Mamba environment** mirroring the deployment image. The repo-root `environment.yml`
8
+ captures the conda half (Python 3.9, `rdkit=2023.03.2`, `boost=1.78`, cairo, etc.):
9
  ```bash
10
+ mamba env create -n website_c -f ../../environment.yml
11
+ mamba activate website_c
 
12
  ```
13
 
14
+ 2. **Pip dependencies** (GPU torch, Django, DRF, …):
15
  ```bash
16
+ pip install --extra-index-url https://download.pytorch.org/whl/cu118 -r requirements.txt
17
  ```
18
 
19
+ 3. **Model checkpoints** — downloaded automatically from the Hugging Face Hub model repo
20
+ `bani57/checkpoints` on first boot. The remote layout mirrors the on-disk one, so
21
+ `huggingface_hub.snapshot_download(local_dir=CHECKPOINTS_ROOT)` drops files directly
22
+ into the expected paths:
23
  - `src/research/COINs-KGGeneration/graph_completion/checkpoints/` (COINs: `{dataset}_{algorithm}.tar`)
24
+ - `src/research/COINs-KGGeneration/graph_completion/results/{dataset}/` (KBGAT TransE init: `transe_model.tar`)
25
  - `src/research/COINs-KGGeneration/graph_generation/checkpoints/` (KG anomaly: `{dataset}.ckpt`, `{dataset}_correct.ckpt`)
26
  - `src/research/MultiProxAn/checkpoints/` (graph generation: `{dataset}.ckpt`, `{dataset}_c.ckpt`)
27
 
28
+ To (re-)publish the checkpoints to the Hub from a local copy:
29
+ ```bash
30
+ huggingface-cli login # one-time
31
+ python ../../scripts/upload_checkpoints.py --create
32
+ ```
33
+
34
  4. **Dataset files** — the raw KG data files must be present under `src/research/COINs-KGGeneration/data/` (FB15k-237, WN18RR, NELL-995).
35
 
36
  ## Running
 
54
  | `DJANGO_SECRET_KEY` | `dev-insecure-key-change-in-production` | Django secret key. **Set in production.** |
55
  | `DJANGO_DEBUG` | `True` | Enable debug mode. Set to `False` in production. |
56
  | `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
57
+ | `CORS_ALLOWED_ORIGINS` | `https://bani57-website.hf.space` | Comma-separated allowed CORS origins. |
58
  | `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. |
59
+ | `RESEARCH_ROOT` | `<repo>/src/research` | Where the research-code modules live. |
60
+ | `CHECKPOINTS_ROOT` | Same as `RESEARCH_ROOT` | Where `huggingface_hub` deposits weights. In the container this is `/app/checkpoints` on a writable volume. |
61
+ | `HF_CHECKPOINTS_REPO` | `bani57/checkpoints` | HF Hub model repo holding all weights. |
62
+ | `HF_TOKEN` | unset | Only needed if the checkpoint repo is private. |
63
+ | `SPA_DIST_DIR` | `<backend>/dist` | Folder containing `index.html` from `npm run build`. WhiteNoise serves assets from here. |
64
 
65
  ## Startup Sequence
66
 
67
+ In the deployment container the entrypoint script pre-warms the checkpoint download
68
+ from the Hugging Face Hub *before* gunicorn starts, so workers never block on the
69
+ network. Then on Django boot (`ApiConfig.ready()`), the `ModelRegistry` initializes:
70
 
71
+ 1. **Verify / download checkpoints** from `bani57/checkpoints` on HF Hub if any expected
72
+ subdir is missing. Idempotent — a no-op when the entrypoint already populated the tree
73
+ or when running locally with weights on disk.
74
  2. **Scan checkpoint directories** to detect available models per method
75
  3. **Load lightweight COINs Loaders** — one per dataset (freebase, wordnet, nell), loading graph data, name maps, and train/val/test splits. Heavy arrays (node neighbours ~275MB each, community neighbours, adjacency dicts) are freed after initialization to keep memory low.
76
  4. **Generate sample subgraphs** for KG anomaly using the COINs Loaders
77
 
78
  All model weights (COINs inference, graph generation, KG anomaly) are loaded lazily at first inference request.
79
 
80
+ ## Deployment
81
+
82
+ The site is packaged as a single Docker image and deployed to a Hugging Face Space
83
+ (`bani57/website` -> <https://bani57-website.hf.space>). The image:
84
+
85
+ - builds the Vue SPA with `npm run build` in a Node 20 stage,
86
+ - assembles a `mambaorg/micromamba` runtime mirroring the local `website_c` env from
87
+ `environment.yml` + `requirements.txt` (GPU torch wheels, `cu118`),
88
+ - copies the SPA `dist/` next to Django so WhiteNoise serves it on the same origin as
89
+ `/api/v1/`,
90
+ - runs `entrypoint.sh`, which `snapshot_download`s checkpoints from
91
+ `bani57/checkpoints` on HF Hub into `/app/checkpoints` and execs `gunicorn` on `0.0.0.0:7860`.
92
+
93
+ Local reproduction:
94
+ ```bash
95
+ docker compose up --build
96
+ # -> http://localhost:7860
97
+ ```
98
+
99
+ Push to the Space (one-time remote setup):
100
+ ```bash
101
+ git remote add hf https://huggingface.co/spaces/bani57/website
102
+ git push hf master:main
103
+ ```
104
+
105
  ## API Endpoints
106
 
107
  All endpoints are prefixed with `/api/v1/`.
src/backend/api/services/registry.py CHANGED
@@ -60,25 +60,20 @@ def _safe_load_lightning_checkpoint(cls, ckpt_path):
60
  return model
61
 
62
 
63
- GDRIVE_FOLDER_ID = "14Bf8fi4KJn0rDdh9y8EFyA5b8OpQyXWi"
64
-
65
- GDRIVE_SUBFOLDERS = {
66
- "coins": {
67
- "folder_name": "COINs-KGGeneration/checkpoints_coins",
68
- "local_dir_setting": "COINS_COMPLETION_DIR",
69
- "local_subdir": "checkpoints",
70
- },
71
- "kg_generation": {
72
- "folder_name": "COINs-KGGeneration/checkpoints_kg_generation",
73
- "local_dir_setting": "DIGRESS_KG_DIR",
74
- "local_subdir": "checkpoints",
75
- },
76
- "multiproxan": {
77
- "folder_name": "MultiProxAn/checkpoints",
78
- "local_dir_setting": "MULTIPROXAN_DIR",
79
- "local_subdir": "checkpoints",
80
- },
81
- }
82
 
83
  # Shared sampler hyperparameters used across all COINs experiments
84
  _SAMPLER_HPARS = {
@@ -340,36 +335,45 @@ class ModelRegistry:
340
  # ---- Checkpoint download -------------------------------------------
341
 
342
  def _download_checkpoints(self):
343
- """Download checkpoints from Google Drive if not already present locally."""
 
 
 
 
 
 
 
344
  if self._all_checkpoint_dirs_populated():
345
- logger.info("All checkpoint directories already populated, skipping Google Drive download")
346
  return
347
 
348
  try:
349
- import gdown
350
  except ImportError:
351
- logger.warning("gdown not installed, skipping checkpoint download")
352
  return
353
 
354
- gdrive_url = f"https://drive.google.com/drive/folders/{GDRIVE_FOLDER_ID}"
355
- logger.info("Downloading checkpoints from Google Drive: %s", gdrive_url)
 
356
 
357
  try:
358
- staging_dir = Path(settings.BASE_DIR) / ".checkpoint_staging"
359
- staging_dir.mkdir(exist_ok=True)
360
-
361
- gdown.download_folder(
362
- gdrive_url, output=str(staging_dir), quiet=False, resume=True,
 
 
363
  )
364
-
365
- self._distribute_checkpoints(staging_dir)
366
  except Exception:
367
- logger.exception("Failed to download checkpoints from Google Drive, continuing with local files")
368
 
369
  def _all_checkpoint_dirs_populated(self):
370
- """Check if all checkpoint dirs already have at least one checkpoint file."""
371
- for config in GDRIVE_SUBFOLDERS.values():
372
- dest_dir = Path(getattr(settings, config["local_dir_setting"])) / config["local_subdir"]
 
373
  if not dest_dir.exists():
374
  return False
375
  ckpt_files = list(dest_dir.glob("*.tar")) + list(dest_dir.glob("*.ckpt"))
@@ -377,40 +381,6 @@ class ModelRegistry:
377
  return False
378
  return True
379
 
380
- def _distribute_checkpoints(self, staging_dir):
381
- """Move downloaded files from staging into the correct checkpoint directories."""
382
- for group, config in GDRIVE_SUBFOLDERS.items():
383
- src_dir = staging_dir / config["folder_name"]
384
- if not src_dir.exists():
385
- logger.warning("Expected staging subfolder not found: %s", src_dir)
386
- continue
387
-
388
- dest_dir = Path(getattr(settings, config["local_dir_setting"])) / config["local_subdir"]
389
- dest_dir.mkdir(parents=True, exist_ok=True)
390
-
391
- for src_file in src_dir.iterdir():
392
- if not src_file.is_file():
393
- continue
394
- # transe_model.tar files (named transe_model_{dataset}.tar) are used for KBGAT
395
- # initialization and must land in results/{dataset}/ not checkpoints/.
396
- if group == "coins" and src_file.stem.startswith("transe_model_"):
397
- dataset_name = src_file.stem[len("transe_model_"):]
398
- transe_dest_dir = Path(getattr(settings, config["local_dir_setting"])) / "results" / dataset_name
399
- transe_dest_dir.mkdir(parents=True, exist_ok=True)
400
- transe_dest_file = transe_dest_dir / "transe_model.tar"
401
- if transe_dest_file.exists() and transe_dest_file.stat().st_size == src_file.stat().st_size:
402
- logger.debug("TransE model already present, skipping: %s", transe_dest_file)
403
- continue
404
- logger.info("Installing TransE model: %s -> %s", src_file.name, transe_dest_dir)
405
- src_file.replace(transe_dest_file)
406
- continue
407
- dest_file = dest_dir / src_file.name
408
- if dest_file.exists() and dest_file.stat().st_size == src_file.stat().st_size:
409
- logger.debug("Checkpoint already present, skipping: %s", dest_file.name)
410
- continue
411
- logger.info("Installing checkpoint: %s -> %s", src_file.name, dest_dir)
412
- src_file.replace(dest_file)
413
-
414
  # ---- Checkpoint scanning -------------------------------------------
415
 
416
  def _scan_checkpoints(self):
 
60
  return model
61
 
62
 
63
+ # Hugging Face Hub model repo holding all checkpoints. The repo mirrors the
64
+ # on-disk layout under settings.CHECKPOINTS_ROOT (RESEARCH_ROOT by default), so
65
+ # snapshot_download() drops every file into its final location and the scan
66
+ # routines below find them unchanged.
67
+ HF_CHECKPOINTS_REPO = os.environ.get("HF_CHECKPOINTS_REPO", "bani57/checkpoints")
68
+
69
+ # Per-area checkpoint subdirectories (relative to CHECKPOINTS_ROOT). Used to
70
+ # detect a fully-populated tree so we can skip the network round-trip on warm
71
+ # starts.
72
+ _CHECKPOINT_SUBDIRS = (
73
+ Path("COINs-KGGeneration") / "graph_completion" / "checkpoints",
74
+ Path("COINs-KGGeneration") / "graph_generation" / "checkpoints",
75
+ Path("MultiProxAn") / "checkpoints",
76
+ )
 
 
 
 
 
77
 
78
  # Shared sampler hyperparameters used across all COINs experiments
79
  _SAMPLER_HPARS = {
 
335
  # ---- Checkpoint download -------------------------------------------
336
 
337
  def _download_checkpoints(self):
338
+ """Download checkpoints from Hugging Face Hub if not already present.
339
+
340
+ The HF repo mirrors the on-disk layout under ``CHECKPOINTS_ROOT``, so a
341
+ single ``snapshot_download`` drops every file into its final location.
342
+ Idempotent: when all expected subdirs are populated we skip the
343
+ network round-trip. In production the entrypoint script also pre-warms
344
+ this download before gunicorn starts, so workers never block on it.
345
+ """
346
  if self._all_checkpoint_dirs_populated():
347
+ logger.info("All checkpoint directories already populated, skipping HF Hub download")
348
  return
349
 
350
  try:
351
+ from huggingface_hub import snapshot_download
352
  except ImportError:
353
+ logger.warning("huggingface_hub not installed, skipping checkpoint download")
354
  return
355
 
356
+ target = Path(settings.CHECKPOINTS_ROOT)
357
+ target.mkdir(parents=True, exist_ok=True)
358
+ logger.info("Downloading checkpoints from HF Hub repo %s -> %s", HF_CHECKPOINTS_REPO, target)
359
 
360
  try:
361
+ snapshot_download(
362
+ repo_id=HF_CHECKPOINTS_REPO,
363
+ repo_type="model",
364
+ local_dir=str(target),
365
+ local_dir_use_symlinks=False,
366
+ max_workers=4,
367
+ token=os.environ.get("HF_TOKEN"),
368
  )
 
 
369
  except Exception:
370
+ logger.exception("Failed to download checkpoints from HF Hub, continuing with local files")
371
 
372
  def _all_checkpoint_dirs_populated(self):
373
+ """True if every expected checkpoint subdir contains at least one weight file."""
374
+ root = Path(settings.CHECKPOINTS_ROOT)
375
+ for sub in _CHECKPOINT_SUBDIRS:
376
+ dest_dir = root / sub
377
  if not dest_dir.exists():
378
  return False
379
  ckpt_files = list(dest_dir.glob("*.tar")) + list(dest_dir.glob("*.ckpt"))
 
381
  return False
382
  return True
383
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
384
  # ---- Checkpoint scanning -------------------------------------------
385
 
386
  def _scan_checkpoints(self):
src/backend/requirements.txt CHANGED
@@ -3,8 +3,14 @@ django==4.2.*
3
  djangorestframework==3.14.*
4
  django-cors-headers==4.*
5
 
6
- # Checkpoint download from Google Drive
7
- gdown>=4.7
 
 
 
 
 
 
8
 
9
  # PyTorch with CUDA 11.8 (falls back to CPU at runtime if no GPU present)
10
  --extra-index-url https://download.pytorch.org/whl/cu118
@@ -35,6 +41,7 @@ scikit-learn>=1.0
35
  Pillow>=9.5.0
36
  overrides==7.3.1
37
 
38
- # Conda-only deps (must be pre-installed, not in pip requirements):
39
- # rdkit==2023.03.2 (conda create -c conda-forge)
40
- # graph-tool==2.45 (conda install -c conda-forge)
 
 
3
  djangorestframework==3.14.*
4
  django-cors-headers==4.*
5
 
6
+ # Static-file serving (SPA dist + Django admin) — single-origin deploy
7
+ whitenoise[brotli]>=6.7
8
+
9
+ # Production WSGI server
10
+ gunicorn>=21.2
11
+
12
+ # Checkpoint download from Hugging Face Hub (replaces gdown / Google Drive)
13
+ huggingface_hub>=0.25
14
 
15
  # PyTorch with CUDA 11.8 (falls back to CPU at runtime if no GPU present)
16
  --extra-index-url https://download.pytorch.org/whl/cu118
 
41
  Pillow>=9.5.0
42
  overrides==7.3.1
43
 
44
+ # Conda-only deps (must be pre-installed via the bundled environment.yml):
45
+ # rdkit=2023.03.2 required, used by molecule rendering for the
46
+ # MultiProxAn QM9/MOSES/Guacamol demos
47
+ # boost=1.78 — rdkit transitive on conda-forge
src/backend/research_api/settings.py CHANGED
@@ -21,12 +21,17 @@ ALLOWED_HOSTS = os.environ.get("DJANGO_ALLOWED_HOSTS", "localhost,127.0.0.1").sp
21
  INSTALLED_APPS = [
22
  "corsheaders",
23
  "rest_framework",
 
24
  "api",
25
  ]
26
 
27
  MIDDLEWARE = [
28
  "corsheaders.middleware.CorsMiddleware",
 
 
29
  "django.middleware.common.CommonMiddleware",
 
 
30
  ]
31
 
32
  ROOT_URLCONF = "research_api.urls"
@@ -45,16 +50,47 @@ REST_FRAMEWORK = {
45
  }
46
 
47
  CORS_ALLOWED_ORIGINS = [
48
- "https://bani57.pythonanywhere.com",
 
 
49
  ]
50
  if DEBUG:
51
  CORS_ALLOW_ALL_ORIGINS = True
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  # Research code paths
54
- COINS_DATA_DIR = PROJECT_ROOT / "src" / "research" / "COINs-KGGeneration" / "data"
55
- COINS_COMPLETION_DIR = PROJECT_ROOT / "src" / "research" / "COINs-KGGeneration" / "graph_completion"
56
- DIGRESS_KG_DIR = PROJECT_ROOT / "src" / "research" / "COINs-KGGeneration" / "graph_generation"
57
- MULTIPROXAN_DIR = PROJECT_ROOT / "src" / "research" / "MultiProxAn"
 
 
 
 
 
 
 
 
58
 
59
  DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"
60
 
 
21
  INSTALLED_APPS = [
22
  "corsheaders",
23
  "rest_framework",
24
+ "django.contrib.staticfiles",
25
  "api",
26
  ]
27
 
28
  MIDDLEWARE = [
29
  "corsheaders.middleware.CorsMiddleware",
30
+ "django.middleware.security.SecurityMiddleware",
31
+ "whitenoise.middleware.WhiteNoiseMiddleware",
32
  "django.middleware.common.CommonMiddleware",
33
+ "django.middleware.csrf.CsrfViewMiddleware",
34
+ "django.middleware.clickjacking.XFrameOptionsMiddleware",
35
  ]
36
 
37
  ROOT_URLCONF = "research_api.urls"
 
50
  }
51
 
52
  CORS_ALLOWED_ORIGINS = [
53
+ o.strip() for o in os.environ.get(
54
+ "CORS_ALLOWED_ORIGINS", "https://bani57-website.hf.space"
55
+ ).split(",") if o.strip()
56
  ]
57
  if DEBUG:
58
  CORS_ALLOW_ALL_ORIGINS = True
59
 
60
+ # Security headers (active when DEBUG=False). Gated on DEBUG so local dev
61
+ # over plain HTTP doesn't get redirected/blocked.
62
+ SECURE_CONTENT_TYPE_NOSNIFF = True
63
+ SECURE_REFERRER_POLICY = "same-origin"
64
+ X_FRAME_OPTIONS = "DENY"
65
+ if not DEBUG:
66
+ SECURE_PROXY_SSL_HEADER = ("HTTP_X_FORWARDED_PROTO", "https")
67
+ SECURE_SSL_REDIRECT = False # HF Spaces terminates TLS upstream; redirect would loop
68
+ SECURE_HSTS_SECONDS = 31536000
69
+ SECURE_HSTS_INCLUDE_SUBDOMAINS = True
70
+ SECURE_HSTS_PRELOAD = False
71
+ CSRF_COOKIE_SECURE = True
72
+ SESSION_COOKIE_SECURE = True
73
+ CSRF_TRUSTED_ORIGINS = [o for o in CORS_ALLOWED_ORIGINS if o.startswith("https://")]
74
+
75
+ # Research code root. Inside the container the checkpoints live alongside the
76
+ # research code under /app/research; in dev they live in the repo at
77
+ # src/research/. CHECKPOINTS_ROOT is what huggingface_hub will populate.
78
+ RESEARCH_ROOT = Path(os.environ.get("RESEARCH_ROOT", PROJECT_ROOT / "src" / "research"))
79
+ CHECKPOINTS_ROOT = Path(os.environ.get("CHECKPOINTS_ROOT", RESEARCH_ROOT))
80
+
81
  # Research code paths
82
+ COINS_DATA_DIR = RESEARCH_ROOT / "COINs-KGGeneration" / "data"
83
+ COINS_COMPLETION_DIR = CHECKPOINTS_ROOT / "COINs-KGGeneration" / "graph_completion"
84
+ DIGRESS_KG_DIR = CHECKPOINTS_ROOT / "COINs-KGGeneration" / "graph_generation"
85
+ MULTIPROXAN_DIR = CHECKPOINTS_ROOT / "MultiProxAn"
86
+
87
+ # Static files & SPA dist (Vue's npm run build output, copied into the image)
88
+ STATIC_URL = "/static/"
89
+ STATIC_ROOT = BASE_DIR / "staticfiles"
90
+ STATICFILES_STORAGE = "whitenoise.storage.CompressedManifestStaticFilesStorage"
91
+ SPA_DIST_DIR = Path(os.environ.get("SPA_DIST_DIR", BASE_DIR / "dist"))
92
+ WHITENOISE_ROOT = str(SPA_DIST_DIR)
93
+ WHITENOISE_INDEX_FILE = "index.html"
94
 
95
  DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"
96
 
src/backend/research_api/urls.py CHANGED
@@ -1,5 +1,23 @@
1
- from django.urls import include, path
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  urlpatterns = [
4
  path("api/v1/", include("api.urls")),
 
5
  ]
 
1
+ from django.conf import settings
2
+ from django.http import FileResponse, Http404
3
+ from django.urls import include, path, re_path
4
+
5
+
6
+ def spa_index(_request):
7
+ """Serve the SPA shell for any non-API path.
8
+
9
+ WhiteNoise serves real files from ``SPA_DIST_DIR`` (assets, favicon, …)
10
+ before URL routing, so this view only fires for client-side routes such
11
+ as ``/cv``, ``/demos/coins`` or unknown paths — Vue Router picks them up
12
+ on the client (the wildcard route renders the Lara Croft 404).
13
+ """
14
+ index = settings.SPA_DIST_DIR / "index.html"
15
+ if not index.exists():
16
+ raise Http404("SPA build is missing — run `npm run build` in src/frontend first")
17
+ return FileResponse(open(index, "rb"), content_type="text/html")
18
+
19
 
20
  urlpatterns = [
21
  path("api/v1/", include("api.urls")),
22
+ re_path(r"^(?!api/).*$", spa_index, name="spa-index"),
23
  ]