feat(backend): rewire for hugging face spaces deployment
Browse files- registry.py: replace gdown / Google Drive with
huggingface_hub.snapshot_download from bani57/checkpoints. The
remote layout mirrors the on-disk one under CHECKPOINTS_ROOT, so the
scan logic finds files unchanged and the old _distribute_checkpoints
step is gone.
- settings.py: env-drive ALLOWED_HOSTS, CORS_ALLOWED_ORIGINS,
RESEARCH_ROOT and CHECKPOINTS_ROOT so dev (local repo paths) and
prod (writable /app/checkpoints) share one settings file. Add
whitenoise + django.contrib.staticfiles for same-origin SPA
serving via SPA_DIST_DIR / WHITENOISE_ROOT, plus SecurityMiddleware,
CsrfViewMiddleware and XFrameOptionsMiddleware (deploy-check
hygiene). HSTS / secure cookies / CSRF_TRUSTED_ORIGINS gated on
DEBUG=False; SECURE_SSL_REDIRECT stays off because HF Spaces
terminates TLS upstream.
- urls.py: add a non-API catch-all view that serves dist/index.html
so Vue Router (and its Lara Croft 404) handles client-side routes.
- requirements.txt: drop gdown, add huggingface_hub,
whitenoise[brotli], gunicorn. Trim the conda-only comment to the
packages actually present in website_c (rdkit, boost).
- README.md: rewrite Prerequisites, the env-var table and the
Startup Sequence; add a Deployment section pointing at the
Dockerfile / docker compose flow.
- src/backend/README.md +54 -10
- src/backend/api/services/registry.py +40 -70
- src/backend/requirements.txt +12 -5
- src/backend/research_api/settings.py +41 -5
- src/backend/research_api/urls.py +19 -1
|
@@ -4,24 +4,33 @@ Stateless REST API serving the PhD research models. No database — PyTorch chec
|
|
| 4 |
|
| 5 |
## Prerequisites
|
| 6 |
|
| 7 |
-
1. **
|
|
|
|
| 8 |
```bash
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
conda install -c conda-forge rdkit=2023.03.2 graph-tool=2.45
|
| 12 |
```
|
| 13 |
|
| 14 |
-
2. **Pip dependencies**:
|
| 15 |
```bash
|
| 16 |
-
pip install -r requirements.txt
|
| 17 |
```
|
| 18 |
|
| 19 |
-
3. **Model checkpoints** — downloaded automatically from
|
|
|
|
|
|
|
|
|
|
| 20 |
- `src/research/COINs-KGGeneration/graph_completion/checkpoints/` (COINs: `{dataset}_{algorithm}.tar`)
|
| 21 |
-
- `src/research/COINs-KGGeneration/graph_completion/results/{dataset}/` (KBGAT TransE init: `transe_model.tar`
|
| 22 |
- `src/research/COINs-KGGeneration/graph_generation/checkpoints/` (KG anomaly: `{dataset}.ckpt`, `{dataset}_correct.ckpt`)
|
| 23 |
- `src/research/MultiProxAn/checkpoints/` (graph generation: `{dataset}.ckpt`, `{dataset}_c.ckpt`)
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
4. **Dataset files** — the raw KG data files must be present under `src/research/COINs-KGGeneration/data/` (FB15k-237, WN18RR, NELL-995).
|
| 26 |
|
| 27 |
## Running
|
|
@@ -45,19 +54,54 @@ The API is served at `http://localhost:8000/api/v1/`.
|
|
| 45 |
| `DJANGO_SECRET_KEY` | `dev-insecure-key-change-in-production` | Django secret key. **Set in production.** |
|
| 46 |
| `DJANGO_DEBUG` | `True` | Enable debug mode. Set to `False` in production. |
|
| 47 |
| `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
|
|
|
|
| 48 |
| `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
## Startup Sequence
|
| 51 |
|
| 52 |
-
|
|
|
|
|
|
|
| 53 |
|
| 54 |
-
1. **
|
|
|
|
|
|
|
| 55 |
2. **Scan checkpoint directories** to detect available models per method
|
| 56 |
3. **Load lightweight COINs Loaders** — one per dataset (freebase, wordnet, nell), loading graph data, name maps, and train/val/test splits. Heavy arrays (node neighbours ~275MB each, community neighbours, adjacency dicts) are freed after initialization to keep memory low.
|
| 57 |
4. **Generate sample subgraphs** for KG anomaly using the COINs Loaders
|
| 58 |
|
| 59 |
All model weights (COINs inference, graph generation, KG anomaly) are loaded lazily at first inference request.
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
## API Endpoints
|
| 62 |
|
| 63 |
All endpoints are prefixed with `/api/v1/`.
|
|
|
|
| 4 |
|
| 5 |
## Prerequisites
|
| 6 |
|
| 7 |
+
1. **Mamba environment** mirroring the deployment image. The repo-root `environment.yml`
|
| 8 |
+
captures the conda half (Python 3.9, `rdkit=2023.03.2`, `boost=1.78`, cairo, etc.):
|
| 9 |
```bash
|
| 10 |
+
mamba env create -n website_c -f ../../environment.yml
|
| 11 |
+
mamba activate website_c
|
|
|
|
| 12 |
```
|
| 13 |
|
| 14 |
+
2. **Pip dependencies** (GPU torch, Django, DRF, …):
|
| 15 |
```bash
|
| 16 |
+
pip install --extra-index-url https://download.pytorch.org/whl/cu118 -r requirements.txt
|
| 17 |
```
|
| 18 |
|
| 19 |
+
3. **Model checkpoints** — downloaded automatically from the Hugging Face Hub model repo
|
| 20 |
+
`bani57/checkpoints` on first boot. The remote layout mirrors the on-disk one, so
|
| 21 |
+
`huggingface_hub.snapshot_download(local_dir=CHECKPOINTS_ROOT)` drops files directly
|
| 22 |
+
into the expected paths:
|
| 23 |
- `src/research/COINs-KGGeneration/graph_completion/checkpoints/` (COINs: `{dataset}_{algorithm}.tar`)
|
| 24 |
+
- `src/research/COINs-KGGeneration/graph_completion/results/{dataset}/` (KBGAT TransE init: `transe_model.tar`)
|
| 25 |
- `src/research/COINs-KGGeneration/graph_generation/checkpoints/` (KG anomaly: `{dataset}.ckpt`, `{dataset}_correct.ckpt`)
|
| 26 |
- `src/research/MultiProxAn/checkpoints/` (graph generation: `{dataset}.ckpt`, `{dataset}_c.ckpt`)
|
| 27 |
|
| 28 |
+
To (re-)publish the checkpoints to the Hub from a local copy:
|
| 29 |
+
```bash
|
| 30 |
+
huggingface-cli login # one-time
|
| 31 |
+
python ../../scripts/upload_checkpoints.py --create
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
4. **Dataset files** — the raw KG data files must be present under `src/research/COINs-KGGeneration/data/` (FB15k-237, WN18RR, NELL-995).
|
| 35 |
|
| 36 |
## Running
|
|
|
|
| 54 |
| `DJANGO_SECRET_KEY` | `dev-insecure-key-change-in-production` | Django secret key. **Set in production.** |
|
| 55 |
| `DJANGO_DEBUG` | `True` | Enable debug mode. Set to `False` in production. |
|
| 56 |
| `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
|
| 57 |
+
| `CORS_ALLOWED_ORIGINS` | `https://bani57-website.hf.space` | Comma-separated allowed CORS origins. |
|
| 58 |
| `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. |
|
| 59 |
+
| `RESEARCH_ROOT` | `<repo>/src/research` | Where the research-code modules live. |
|
| 60 |
+
| `CHECKPOINTS_ROOT` | Same as `RESEARCH_ROOT` | Where `huggingface_hub` deposits weights. In the container this is `/app/checkpoints` on a writable volume. |
|
| 61 |
+
| `HF_CHECKPOINTS_REPO` | `bani57/checkpoints` | HF Hub model repo holding all weights. |
|
| 62 |
+
| `HF_TOKEN` | unset | Only needed if the checkpoint repo is private. |
|
| 63 |
+
| `SPA_DIST_DIR` | `<backend>/dist` | Folder containing `index.html` from `npm run build`. WhiteNoise serves assets from here. |
|
| 64 |
|
| 65 |
## Startup Sequence
|
| 66 |
|
| 67 |
+
In the deployment container the entrypoint script pre-warms the checkpoint download
|
| 68 |
+
from the Hugging Face Hub *before* gunicorn starts, so workers never block on the
|
| 69 |
+
network. Then on Django boot (`ApiConfig.ready()`), the `ModelRegistry` initializes:
|
| 70 |
|
| 71 |
+
1. **Verify / download checkpoints** from `bani57/checkpoints` on HF Hub if any expected
|
| 72 |
+
subdir is missing. Idempotent — a no-op when the entrypoint already populated the tree
|
| 73 |
+
or when running locally with weights on disk.
|
| 74 |
2. **Scan checkpoint directories** to detect available models per method
|
| 75 |
3. **Load lightweight COINs Loaders** — one per dataset (freebase, wordnet, nell), loading graph data, name maps, and train/val/test splits. Heavy arrays (node neighbours ~275MB each, community neighbours, adjacency dicts) are freed after initialization to keep memory low.
|
| 76 |
4. **Generate sample subgraphs** for KG anomaly using the COINs Loaders
|
| 77 |
|
| 78 |
All model weights (COINs inference, graph generation, KG anomaly) are loaded lazily at first inference request.
|
| 79 |
|
| 80 |
+
## Deployment
|
| 81 |
+
|
| 82 |
+
The site is packaged as a single Docker image and deployed to a Hugging Face Space
|
| 83 |
+
(`bani57/website` -> <https://bani57-website.hf.space>). The image:
|
| 84 |
+
|
| 85 |
+
- builds the Vue SPA with `npm run build` in a Node 20 stage,
|
| 86 |
+
- assembles a `mambaorg/micromamba` runtime mirroring the local `website_c` env from
|
| 87 |
+
`environment.yml` + `requirements.txt` (GPU torch wheels, `cu118`),
|
| 88 |
+
- copies the SPA `dist/` next to Django so WhiteNoise serves it on the same origin as
|
| 89 |
+
`/api/v1/`,
|
| 90 |
+
- runs `entrypoint.sh`, which `snapshot_download`s checkpoints from
|
| 91 |
+
`bani57/checkpoints` on HF Hub into `/app/checkpoints` and execs `gunicorn` on `0.0.0.0:7860`.
|
| 92 |
+
|
| 93 |
+
Local reproduction:
|
| 94 |
+
```bash
|
| 95 |
+
docker compose up --build
|
| 96 |
+
# -> http://localhost:7860
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
Push to the Space (one-time remote setup):
|
| 100 |
+
```bash
|
| 101 |
+
git remote add hf https://huggingface.co/spaces/bani57/website
|
| 102 |
+
git push hf master:main
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
## API Endpoints
|
| 106 |
|
| 107 |
All endpoints are prefixed with `/api/v1/`.
|
|
@@ -60,25 +60,20 @@ def _safe_load_lightning_checkpoint(cls, ckpt_path):
|
|
| 60 |
return model
|
| 61 |
|
| 62 |
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
"folder_name": "MultiProxAn/checkpoints",
|
| 78 |
-
"local_dir_setting": "MULTIPROXAN_DIR",
|
| 79 |
-
"local_subdir": "checkpoints",
|
| 80 |
-
},
|
| 81 |
-
}
|
| 82 |
|
| 83 |
# Shared sampler hyperparameters used across all COINs experiments
|
| 84 |
_SAMPLER_HPARS = {
|
|
@@ -340,36 +335,45 @@ class ModelRegistry:
|
|
| 340 |
# ---- Checkpoint download -------------------------------------------
|
| 341 |
|
| 342 |
def _download_checkpoints(self):
|
| 343 |
-
"""Download checkpoints from
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 344 |
if self._all_checkpoint_dirs_populated():
|
| 345 |
-
logger.info("All checkpoint directories already populated, skipping
|
| 346 |
return
|
| 347 |
|
| 348 |
try:
|
| 349 |
-
import
|
| 350 |
except ImportError:
|
| 351 |
-
logger.warning("
|
| 352 |
return
|
| 353 |
|
| 354 |
-
|
| 355 |
-
|
|
|
|
| 356 |
|
| 357 |
try:
|
| 358 |
-
|
| 359 |
-
|
| 360 |
-
|
| 361 |
-
|
| 362 |
-
|
|
|
|
|
|
|
| 363 |
)
|
| 364 |
-
|
| 365 |
-
self._distribute_checkpoints(staging_dir)
|
| 366 |
except Exception:
|
| 367 |
-
logger.exception("Failed to download checkpoints from
|
| 368 |
|
| 369 |
def _all_checkpoint_dirs_populated(self):
|
| 370 |
-
"""
|
| 371 |
-
|
| 372 |
-
|
|
|
|
| 373 |
if not dest_dir.exists():
|
| 374 |
return False
|
| 375 |
ckpt_files = list(dest_dir.glob("*.tar")) + list(dest_dir.glob("*.ckpt"))
|
|
@@ -377,40 +381,6 @@ class ModelRegistry:
|
|
| 377 |
return False
|
| 378 |
return True
|
| 379 |
|
| 380 |
-
def _distribute_checkpoints(self, staging_dir):
|
| 381 |
-
"""Move downloaded files from staging into the correct checkpoint directories."""
|
| 382 |
-
for group, config in GDRIVE_SUBFOLDERS.items():
|
| 383 |
-
src_dir = staging_dir / config["folder_name"]
|
| 384 |
-
if not src_dir.exists():
|
| 385 |
-
logger.warning("Expected staging subfolder not found: %s", src_dir)
|
| 386 |
-
continue
|
| 387 |
-
|
| 388 |
-
dest_dir = Path(getattr(settings, config["local_dir_setting"])) / config["local_subdir"]
|
| 389 |
-
dest_dir.mkdir(parents=True, exist_ok=True)
|
| 390 |
-
|
| 391 |
-
for src_file in src_dir.iterdir():
|
| 392 |
-
if not src_file.is_file():
|
| 393 |
-
continue
|
| 394 |
-
# transe_model.tar files (named transe_model_{dataset}.tar) are used for KBGAT
|
| 395 |
-
# initialization and must land in results/{dataset}/ not checkpoints/.
|
| 396 |
-
if group == "coins" and src_file.stem.startswith("transe_model_"):
|
| 397 |
-
dataset_name = src_file.stem[len("transe_model_"):]
|
| 398 |
-
transe_dest_dir = Path(getattr(settings, config["local_dir_setting"])) / "results" / dataset_name
|
| 399 |
-
transe_dest_dir.mkdir(parents=True, exist_ok=True)
|
| 400 |
-
transe_dest_file = transe_dest_dir / "transe_model.tar"
|
| 401 |
-
if transe_dest_file.exists() and transe_dest_file.stat().st_size == src_file.stat().st_size:
|
| 402 |
-
logger.debug("TransE model already present, skipping: %s", transe_dest_file)
|
| 403 |
-
continue
|
| 404 |
-
logger.info("Installing TransE model: %s -> %s", src_file.name, transe_dest_dir)
|
| 405 |
-
src_file.replace(transe_dest_file)
|
| 406 |
-
continue
|
| 407 |
-
dest_file = dest_dir / src_file.name
|
| 408 |
-
if dest_file.exists() and dest_file.stat().st_size == src_file.stat().st_size:
|
| 409 |
-
logger.debug("Checkpoint already present, skipping: %s", dest_file.name)
|
| 410 |
-
continue
|
| 411 |
-
logger.info("Installing checkpoint: %s -> %s", src_file.name, dest_dir)
|
| 412 |
-
src_file.replace(dest_file)
|
| 413 |
-
|
| 414 |
# ---- Checkpoint scanning -------------------------------------------
|
| 415 |
|
| 416 |
def _scan_checkpoints(self):
|
|
|
|
| 60 |
return model
|
| 61 |
|
| 62 |
|
| 63 |
+
# Hugging Face Hub model repo holding all checkpoints. The repo mirrors the
|
| 64 |
+
# on-disk layout under settings.CHECKPOINTS_ROOT (RESEARCH_ROOT by default), so
|
| 65 |
+
# snapshot_download() drops every file into its final location and the scan
|
| 66 |
+
# routines below find them unchanged.
|
| 67 |
+
HF_CHECKPOINTS_REPO = os.environ.get("HF_CHECKPOINTS_REPO", "bani57/checkpoints")
|
| 68 |
+
|
| 69 |
+
# Per-area checkpoint subdirectories (relative to CHECKPOINTS_ROOT). Used to
|
| 70 |
+
# detect a fully-populated tree so we can skip the network round-trip on warm
|
| 71 |
+
# starts.
|
| 72 |
+
_CHECKPOINT_SUBDIRS = (
|
| 73 |
+
Path("COINs-KGGeneration") / "graph_completion" / "checkpoints",
|
| 74 |
+
Path("COINs-KGGeneration") / "graph_generation" / "checkpoints",
|
| 75 |
+
Path("MultiProxAn") / "checkpoints",
|
| 76 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
# Shared sampler hyperparameters used across all COINs experiments
|
| 79 |
_SAMPLER_HPARS = {
|
|
|
|
| 335 |
# ---- Checkpoint download -------------------------------------------
|
| 336 |
|
| 337 |
def _download_checkpoints(self):
|
| 338 |
+
"""Download checkpoints from Hugging Face Hub if not already present.
|
| 339 |
+
|
| 340 |
+
The HF repo mirrors the on-disk layout under ``CHECKPOINTS_ROOT``, so a
|
| 341 |
+
single ``snapshot_download`` drops every file into its final location.
|
| 342 |
+
Idempotent: when all expected subdirs are populated we skip the
|
| 343 |
+
network round-trip. In production the entrypoint script also pre-warms
|
| 344 |
+
this download before gunicorn starts, so workers never block on it.
|
| 345 |
+
"""
|
| 346 |
if self._all_checkpoint_dirs_populated():
|
| 347 |
+
logger.info("All checkpoint directories already populated, skipping HF Hub download")
|
| 348 |
return
|
| 349 |
|
| 350 |
try:
|
| 351 |
+
from huggingface_hub import snapshot_download
|
| 352 |
except ImportError:
|
| 353 |
+
logger.warning("huggingface_hub not installed, skipping checkpoint download")
|
| 354 |
return
|
| 355 |
|
| 356 |
+
target = Path(settings.CHECKPOINTS_ROOT)
|
| 357 |
+
target.mkdir(parents=True, exist_ok=True)
|
| 358 |
+
logger.info("Downloading checkpoints from HF Hub repo %s -> %s", HF_CHECKPOINTS_REPO, target)
|
| 359 |
|
| 360 |
try:
|
| 361 |
+
snapshot_download(
|
| 362 |
+
repo_id=HF_CHECKPOINTS_REPO,
|
| 363 |
+
repo_type="model",
|
| 364 |
+
local_dir=str(target),
|
| 365 |
+
local_dir_use_symlinks=False,
|
| 366 |
+
max_workers=4,
|
| 367 |
+
token=os.environ.get("HF_TOKEN"),
|
| 368 |
)
|
|
|
|
|
|
|
| 369 |
except Exception:
|
| 370 |
+
logger.exception("Failed to download checkpoints from HF Hub, continuing with local files")
|
| 371 |
|
| 372 |
def _all_checkpoint_dirs_populated(self):
|
| 373 |
+
"""True if every expected checkpoint subdir contains at least one weight file."""
|
| 374 |
+
root = Path(settings.CHECKPOINTS_ROOT)
|
| 375 |
+
for sub in _CHECKPOINT_SUBDIRS:
|
| 376 |
+
dest_dir = root / sub
|
| 377 |
if not dest_dir.exists():
|
| 378 |
return False
|
| 379 |
ckpt_files = list(dest_dir.glob("*.tar")) + list(dest_dir.glob("*.ckpt"))
|
|
|
|
| 381 |
return False
|
| 382 |
return True
|
| 383 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 384 |
# ---- Checkpoint scanning -------------------------------------------
|
| 385 |
|
| 386 |
def _scan_checkpoints(self):
|
|
@@ -3,8 +3,14 @@ django==4.2.*
|
|
| 3 |
djangorestframework==3.14.*
|
| 4 |
django-cors-headers==4.*
|
| 5 |
|
| 6 |
-
#
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
# PyTorch with CUDA 11.8 (falls back to CPU at runtime if no GPU present)
|
| 10 |
--extra-index-url https://download.pytorch.org/whl/cu118
|
|
@@ -35,6 +41,7 @@ scikit-learn>=1.0
|
|
| 35 |
Pillow>=9.5.0
|
| 36 |
overrides==7.3.1
|
| 37 |
|
| 38 |
-
# Conda-only deps (must be pre-installed
|
| 39 |
-
#
|
| 40 |
-
#
|
|
|
|
|
|
| 3 |
djangorestframework==3.14.*
|
| 4 |
django-cors-headers==4.*
|
| 5 |
|
| 6 |
+
# Static-file serving (SPA dist + Django admin) — single-origin deploy
|
| 7 |
+
whitenoise[brotli]>=6.7
|
| 8 |
+
|
| 9 |
+
# Production WSGI server
|
| 10 |
+
gunicorn>=21.2
|
| 11 |
+
|
| 12 |
+
# Checkpoint download from Hugging Face Hub (replaces gdown / Google Drive)
|
| 13 |
+
huggingface_hub>=0.25
|
| 14 |
|
| 15 |
# PyTorch with CUDA 11.8 (falls back to CPU at runtime if no GPU present)
|
| 16 |
--extra-index-url https://download.pytorch.org/whl/cu118
|
|
|
|
| 41 |
Pillow>=9.5.0
|
| 42 |
overrides==7.3.1
|
| 43 |
|
| 44 |
+
# Conda-only deps (must be pre-installed via the bundled environment.yml):
|
| 45 |
+
# rdkit=2023.03.2 — required, used by molecule rendering for the
|
| 46 |
+
# MultiProxAn QM9/MOSES/Guacamol demos
|
| 47 |
+
# boost=1.78 — rdkit transitive on conda-forge
|
|
@@ -21,12 +21,17 @@ ALLOWED_HOSTS = os.environ.get("DJANGO_ALLOWED_HOSTS", "localhost,127.0.0.1").sp
|
|
| 21 |
INSTALLED_APPS = [
|
| 22 |
"corsheaders",
|
| 23 |
"rest_framework",
|
|
|
|
| 24 |
"api",
|
| 25 |
]
|
| 26 |
|
| 27 |
MIDDLEWARE = [
|
| 28 |
"corsheaders.middleware.CorsMiddleware",
|
|
|
|
|
|
|
| 29 |
"django.middleware.common.CommonMiddleware",
|
|
|
|
|
|
|
| 30 |
]
|
| 31 |
|
| 32 |
ROOT_URLCONF = "research_api.urls"
|
|
@@ -45,16 +50,47 @@ REST_FRAMEWORK = {
|
|
| 45 |
}
|
| 46 |
|
| 47 |
CORS_ALLOWED_ORIGINS = [
|
| 48 |
-
|
|
|
|
|
|
|
| 49 |
]
|
| 50 |
if DEBUG:
|
| 51 |
CORS_ALLOW_ALL_ORIGINS = True
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
# Research code paths
|
| 54 |
-
COINS_DATA_DIR =
|
| 55 |
-
COINS_COMPLETION_DIR =
|
| 56 |
-
DIGRESS_KG_DIR =
|
| 57 |
-
MULTIPROXAN_DIR =
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"
|
| 60 |
|
|
|
|
| 21 |
INSTALLED_APPS = [
|
| 22 |
"corsheaders",
|
| 23 |
"rest_framework",
|
| 24 |
+
"django.contrib.staticfiles",
|
| 25 |
"api",
|
| 26 |
]
|
| 27 |
|
| 28 |
MIDDLEWARE = [
|
| 29 |
"corsheaders.middleware.CorsMiddleware",
|
| 30 |
+
"django.middleware.security.SecurityMiddleware",
|
| 31 |
+
"whitenoise.middleware.WhiteNoiseMiddleware",
|
| 32 |
"django.middleware.common.CommonMiddleware",
|
| 33 |
+
"django.middleware.csrf.CsrfViewMiddleware",
|
| 34 |
+
"django.middleware.clickjacking.XFrameOptionsMiddleware",
|
| 35 |
]
|
| 36 |
|
| 37 |
ROOT_URLCONF = "research_api.urls"
|
|
|
|
| 50 |
}
|
| 51 |
|
| 52 |
CORS_ALLOWED_ORIGINS = [
|
| 53 |
+
o.strip() for o in os.environ.get(
|
| 54 |
+
"CORS_ALLOWED_ORIGINS", "https://bani57-website.hf.space"
|
| 55 |
+
).split(",") if o.strip()
|
| 56 |
]
|
| 57 |
if DEBUG:
|
| 58 |
CORS_ALLOW_ALL_ORIGINS = True
|
| 59 |
|
| 60 |
+
# Security headers (active when DEBUG=False). Gated on DEBUG so local dev
|
| 61 |
+
# over plain HTTP doesn't get redirected/blocked.
|
| 62 |
+
SECURE_CONTENT_TYPE_NOSNIFF = True
|
| 63 |
+
SECURE_REFERRER_POLICY = "same-origin"
|
| 64 |
+
X_FRAME_OPTIONS = "DENY"
|
| 65 |
+
if not DEBUG:
|
| 66 |
+
SECURE_PROXY_SSL_HEADER = ("HTTP_X_FORWARDED_PROTO", "https")
|
| 67 |
+
SECURE_SSL_REDIRECT = False # HF Spaces terminates TLS upstream; redirect would loop
|
| 68 |
+
SECURE_HSTS_SECONDS = 31536000
|
| 69 |
+
SECURE_HSTS_INCLUDE_SUBDOMAINS = True
|
| 70 |
+
SECURE_HSTS_PRELOAD = False
|
| 71 |
+
CSRF_COOKIE_SECURE = True
|
| 72 |
+
SESSION_COOKIE_SECURE = True
|
| 73 |
+
CSRF_TRUSTED_ORIGINS = [o for o in CORS_ALLOWED_ORIGINS if o.startswith("https://")]
|
| 74 |
+
|
| 75 |
+
# Research code root. Inside the container the checkpoints live alongside the
|
| 76 |
+
# research code under /app/research; in dev they live in the repo at
|
| 77 |
+
# src/research/. CHECKPOINTS_ROOT is what huggingface_hub will populate.
|
| 78 |
+
RESEARCH_ROOT = Path(os.environ.get("RESEARCH_ROOT", PROJECT_ROOT / "src" / "research"))
|
| 79 |
+
CHECKPOINTS_ROOT = Path(os.environ.get("CHECKPOINTS_ROOT", RESEARCH_ROOT))
|
| 80 |
+
|
| 81 |
# Research code paths
|
| 82 |
+
COINS_DATA_DIR = RESEARCH_ROOT / "COINs-KGGeneration" / "data"
|
| 83 |
+
COINS_COMPLETION_DIR = CHECKPOINTS_ROOT / "COINs-KGGeneration" / "graph_completion"
|
| 84 |
+
DIGRESS_KG_DIR = CHECKPOINTS_ROOT / "COINs-KGGeneration" / "graph_generation"
|
| 85 |
+
MULTIPROXAN_DIR = CHECKPOINTS_ROOT / "MultiProxAn"
|
| 86 |
+
|
| 87 |
+
# Static files & SPA dist (Vue's npm run build output, copied into the image)
|
| 88 |
+
STATIC_URL = "/static/"
|
| 89 |
+
STATIC_ROOT = BASE_DIR / "staticfiles"
|
| 90 |
+
STATICFILES_STORAGE = "whitenoise.storage.CompressedManifestStaticFilesStorage"
|
| 91 |
+
SPA_DIST_DIR = Path(os.environ.get("SPA_DIST_DIR", BASE_DIR / "dist"))
|
| 92 |
+
WHITENOISE_ROOT = str(SPA_DIST_DIR)
|
| 93 |
+
WHITENOISE_INDEX_FILE = "index.html"
|
| 94 |
|
| 95 |
DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"
|
| 96 |
|
|
@@ -1,5 +1,23 @@
|
|
| 1 |
-
from django.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
urlpatterns = [
|
| 4 |
path("api/v1/", include("api.urls")),
|
|
|
|
| 5 |
]
|
|
|
|
| 1 |
+
from django.conf import settings
|
| 2 |
+
from django.http import FileResponse, Http404
|
| 3 |
+
from django.urls import include, path, re_path
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def spa_index(_request):
|
| 7 |
+
"""Serve the SPA shell for any non-API path.
|
| 8 |
+
|
| 9 |
+
WhiteNoise serves real files from ``SPA_DIST_DIR`` (assets, favicon, …)
|
| 10 |
+
before URL routing, so this view only fires for client-side routes such
|
| 11 |
+
as ``/cv``, ``/demos/coins`` or unknown paths — Vue Router picks them up
|
| 12 |
+
on the client (the wildcard route renders the Lara Croft 404).
|
| 13 |
+
"""
|
| 14 |
+
index = settings.SPA_DIST_DIR / "index.html"
|
| 15 |
+
if not index.exists():
|
| 16 |
+
raise Http404("SPA build is missing — run `npm run build` in src/frontend first")
|
| 17 |
+
return FileResponse(open(index, "rb"), content_type="text/html")
|
| 18 |
+
|
| 19 |
|
| 20 |
urlpatterns = [
|
| 21 |
path("api/v1/", include("api.urls")),
|
| 22 |
+
re_path(r"^(?!api/).*$", spa_index, name="spa-index"),
|
| 23 |
]
|