# Backend — Django REST API Stateless REST API serving the PhD research models. No database — PyTorch checkpoints are loaded into memory at startup and used to answer each request independently. Three Django app endpoints expose [COINs](../../docs/glossary.md#coins) link prediction and query answering, [MultiProxAn](../../docs/glossary.md#multiproxan) graph generation, and KG anomaly correction; one global `threading.Lock` serializes inference because the deployed container has no GPU and shares 2 vCPU. For deeper material: - [`docs/reference/api.md`](../../docs/reference/api.md) — every endpoint, request and response shape. - [`docs/reference/sse-protocol.md`](../../docs/reference/sse-protocol.md) — wire format for streaming inference. - [`docs/reference/backend-services.md`](../../docs/reference/backend-services.md) — module-by-module reference. - [`docs/explanation/architecture.md`](../../docs/explanation/architecture.md) — how the backend, the SPA and HF Hub fit together. - [`docs/explanation/inference-lifecycle.md`](../../docs/explanation/inference-lifecycle.md) — boot sequence, lazy weight loading, the inference lock. - [`docs/glossary.md`](../../docs/glossary.md) — domain vocabulary. This README covers the practical surface: running the backend, where things live, env vars, the endpoint table, and the streaming protocol summary. ## Prerequisites 1. **Mamba environment** mirroring the deployment image. The repo-root `environment.yml` captures the conda half (Python 3.9, `rdkit=2023.03.2`, `boost=1.78`, cairo, etc.): ```bash mamba env create -n website_c -f ../../environment.yml mamba activate website_c ``` 2. **Pip dependencies** (GPU torch, Django, DRF, …): ```bash pip install --extra-index-url https://download.pytorch.org/whl/cu118 -r requirements.txt ``` 3. **Model checkpoints** — downloaded automatically from the Hugging Face Hub model repo `Bani57/checkpoints` on first boot. The remote layout mirrors the on-disk one, so `huggingface_hub.snapshot_download(local_dir=CHECKPOINTS_ROOT)` drops files directly into the expected paths: - `src/research/COINs-KGGeneration/graph_completion/checkpoints/` (COINs: `{dataset}_{algorithm}.tar`) - `src/research/COINs-KGGeneration/graph_completion/results/{dataset}/` (KBGAT TransE init: `transe_model.tar`) - `src/research/COINs-KGGeneration/graph_generation/checkpoints/` (KG anomaly: `{dataset}.ckpt`, `{dataset}_correct.ckpt`) - `src/research/MultiProxAn/checkpoints/` (graph generation: `{dataset}.ckpt`, `{dataset}_c.ckpt`) To (re-)publish the checkpoints to the Hub from a local copy: ```bash huggingface-cli login # one-time python ../../scripts/upload_checkpoints.py --create ``` 4. **Dataset files** — the raw KG data files must be present under `src/research/COINs-KGGeneration/data/` (FB15k-237, WN18RR, NELL-995). ## Running From `src/backend/`: ```bash # Development server python manage.py runserver 8000 # With custom settings DJANGO_DEBUG=True DJANGO_SECRET_KEY=my-secret python manage.py runserver ``` The API is served at `http://localhost:8000/api/v1/`. ## Environment Variables | Variable | Default | Description | |---|---|---| | `DJANGO_SECRET_KEY` | `dev-insecure-key-change-in-production` | Django secret key. **Set in production.** | | `DJANGO_DEBUG` | `True` | Enable debug mode. Set to `False` in production. | | `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. | | `CORS_ALLOWED_ORIGINS` | `https://bani57-website.hf.space` | Comma-separated allowed CORS origins. | | `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. | | `RESEARCH_ROOT` | `/src/research` (dev), `/app/research` (image) | Where the research-code modules live. | | `CHECKPOINTS_ROOT` | Same as `RESEARCH_ROOT` | Where `huggingface_hub` deposits weights. Override to e.g. `/data/checkpoints` on a paid HF Space with persistent storage. | | `HF_CHECKPOINTS_REPO` | `Bani57/checkpoints` | HF Hub model repo holding all weights. | | `HF_TOKEN` | unset | Recommended. Read-scope token lifts anonymous rate limits and roughly triples cold-start download throughput. Required if the repo is private. Empty values are unset by `entrypoint.sh` to avoid a malformed `Bearer ` header. | | `HF_HUB_ENABLE_HF_TRANSFER` | `1` (image), unset (dev) | Enables the Rust-accelerated `hf_transfer` backend for `snapshot_download`. | | `SPA_DIST_DIR` | `/dist` | Folder containing `index.html` from `npm run build`. WhiteNoise serves assets from here. | ## Startup Sequence In the deployment container the entrypoint script pre-warms the checkpoint download from the Hugging Face Hub *before* gunicorn starts, so workers never block on the network. Then on Django boot (`ApiConfig.ready()`), the `ModelRegistry` initializes: 1. **Verify / download checkpoints** from `Bani57/checkpoints` on HF Hub if any expected subdir is missing. Idempotent — a no-op when the entrypoint already populated the tree or when running locally with weights on disk. 2. **Scan checkpoint directories** to detect available models per method 3. **Load lightweight COINs Loaders** — one per dataset (freebase, wordnet, nell), loading graph data, name maps, and train/val/test splits. Heavy arrays (node neighbours ~275MB each, community neighbours, adjacency dicts) are freed after initialization to keep memory low. 4. **Generate sample subgraphs** for KG anomaly using the COINs Loaders All model weights (COINs inference, graph generation, KG anomaly) are loaded lazily at first inference request. ## Deployment The site is packaged as a single Docker image and deployed to a Hugging Face Space (`Bani57/website` -> ). The image: - builds the Vue SPA with `npm run build` in a Node 20 stage, - assembles a `mambaorg/micromamba` runtime mirroring the local `website_c` env from `environment.yml` + `requirements.txt` (GPU torch wheels, `cu118`), - copies the SPA `dist/` next to Django so WhiteNoise serves it on the same origin as `/api/v1/`, - runs `entrypoint.sh`, which `snapshot_download`s checkpoints from `Bani57/checkpoints` on HF Hub into `/app/checkpoints` and execs `gunicorn` on `0.0.0.0:7860`. Local reproduction: ```bash docker compose up --build # -> http://localhost:7860 ``` Push to the Space (one-time remote setup): ```bash git remote add hf https://huggingface.co/spaces/Bani57/website git push hf master:main ``` ## API Endpoints All endpoints are prefixed with `/api/v1/`. ### Health & Discovery | Method | Path | Description | |---|---|---| | `GET` | `/health` | Service health + model availability + inference lock status | | `GET` | `/methods` | List the 3 research methods | | `POST` | `/debug/force-unlock` | Release stuck inference lock (debug mode only) | ### COINs — KG Reasoning | Method | Path | Description | |---|---|---| | `GET` | `/coins/datasets` | List datasets with entity/relation counts | | `GET` | `/coins/datasets/{id}/entities` | Paginated entity search (`?q=&page=&page_size=`) | | `GET` | `/coins/datasets/{id}/relations` | Paginated relation search (`?q=&page=&page_size=`) | | `GET` | `/coins/datasets/{id}/sample-triples` | Random training triples (`?count=10&seed=...`); optional `seed` makes sampling deterministic (same `seed+count` ⇒ same triples, e.g. seed by ISO date for a day-stable widget). Head/relation/tail each carry a dataset-cleaned `label` alongside `id`, `name` | | `GET` | `/coins/datasets/{id}/sample-query` | Sample a structurally valid KG query (`?query_structure=2i&count=1&seed=...`). Walks the training graph to produce real paths/intersections. Returns `{anchors, relations, target}` keyed by node/edge IDs from `/coins/query-structures`. Preferred over `sample-triples` for multi-hop/intersection prefills | | `GET` | `/coins/models` | Available algorithms + supported query structures | | `GET` | `/coins/query-structures` | Query graph templates for frontend rendering | | `POST` | `/coins/predict` | Run link prediction / query answering | ### Graph Generation — MultiProxAn | Method | Path | Description | |---|---|---| | `GET` | `/graph-generation/datasets` | List graph types with node/edge types | | `GET` | `/graph-generation/sampling-modes` | Sampling strategies with parameter specs | | `POST` | `/graph-generation/generate` | **Streaming SSE.** Generate a graph (standard denoising or MultiProx Gibbs init) | | `POST` | `/graph-generation/continue` | **Streaming SSE.** Advance a MultiProx Gibbs session by one step | ### KG Anomaly Correction | Method | Path | Description | |---|---|---| | `GET` | `/kg-anomaly/datasets` | List datasets with correction models | | `GET` | `/kg-anomaly/datasets/{id}/sample-subgraphs` | Pre-computed example subgraphs (`?count=5&noise_level=0.4&task=correct&seed=42`); noise is task-aware | | `POST` | `/kg-anomaly/correct` | **Streaming SSE.** Correct/regenerate a KG subgraph (standard denoising or MultiProx Gibbs init) | | `POST` | `/kg-anomaly/continue` | **Streaming SSE.** Advance a MultiProx correction session by one step | ## Streaming Inference Protocol (SSE) The graph generation endpoints (`/generate`, `/continue`) return **Server-Sent Events** (`text/event-stream`). Three event types are emitted: **`event: progress`** — phase/step metadata (no images): ``` event: progress data: {"type":"progress","phase":"denoise","step":42,"total_steps":500,"elapsed_ms":2100} ``` KG-anomaly progress events additionally carry an optional `kg_log_likelihood` (float) + `kg_log_likelihood_step` (int) on frame boundaries — the mean log-sigmoid score from the frozen KG embedder + link ranker on the edges currently present in the argmax reconstruction. Higher = cleaner. **`event: preview`** — base64 PNG of the graph's current state, emitted at key frames: ``` event: preview data: data:image/png;base64,... ``` Preview frequency: `denoise` emits at `chain_frames` intervals (~30 over 500 steps), `gibbs` emits every inner step, `refine` emits every ~10% of steps. **`event: result`** — final payload with image, chain GIF, and timing: ``` event: result data: {"type":"result","dataset_id":"qm9","model_type":"discrete","sampling_mode":"standard","image":"data:image/png;base64,...","chain_gif":"data:image/gif;base64,...","inference_time_ms":25000} ``` Phases: `denoise` (standard generation loop), `noise_init` (multiprox init noise sampling), `gibbs` (multiprox inner Gibbs steps), `refine` (multiprox refinement denoising). ## Project Structure ``` src/backend/ manage.py requirements.txt research_api/ # Django project settings settings.py urls.py wsgi.py api/ # Django app apps.py # Triggers ModelRegistry.initialize() on startup urls.py # Route definitions pagination.py # Shared pagination helper exceptions.py # Custom error envelope services/ constants.py # Dataset metadata, model configs, query structures registry.py # ModelRegistry — checkpoint download, scanning, Loader init views/ health.py # /health, /methods coins.py # /coins/* endpoints graph_generation.py # /graph-generation/* endpoints kg_anomaly.py # /kg-anomaly/* endpoints ``` ## Testing with Postman Import the collection and environment from `docs/postman/` to test all discovery endpoints.