website / src /backend /README.md
Andrej Janchevski
docs(deploy): refresh for the post-launch deployment iteration
5ed6f37
# Backend β€” Django REST API
Stateless REST API serving the PhD research models. No database β€” PyTorch checkpoints are loaded into memory at startup and used to answer each request independently. Three Django app endpoints expose [COINs](../../docs/glossary.md#coins) link prediction and query answering, [MultiProxAn](../../docs/glossary.md#multiproxan) graph generation, and KG anomaly correction; one global `threading.Lock` serializes inference because the deployed container has no GPU and shares 2 vCPU.
For deeper material:
- [`docs/reference/api.md`](../../docs/reference/api.md) β€” every endpoint, request and response shape.
- [`docs/reference/sse-protocol.md`](../../docs/reference/sse-protocol.md) β€” wire format for streaming inference.
- [`docs/reference/backend-services.md`](../../docs/reference/backend-services.md) β€” module-by-module reference.
- [`docs/explanation/architecture.md`](../../docs/explanation/architecture.md) β€” how the backend, the SPA and HF Hub fit together.
- [`docs/explanation/inference-lifecycle.md`](../../docs/explanation/inference-lifecycle.md) β€” boot sequence, lazy weight loading, the inference lock.
- [`docs/glossary.md`](../../docs/glossary.md) β€” domain vocabulary.
This README covers the practical surface: running the backend, where things live, env vars, the endpoint table, and the streaming protocol summary.
## Prerequisites
1. **Mamba environment** mirroring the deployment image. The repo-root `environment.yml`
captures the conda half (Python 3.9, `rdkit=2023.03.2`, `boost=1.78`, cairo, etc.):
```bash
mamba env create -n website_c -f ../../environment.yml
mamba activate website_c
```
2. **Pip dependencies** (GPU torch, Django, DRF, …):
```bash
pip install --extra-index-url https://download.pytorch.org/whl/cu118 -r requirements.txt
```
3. **Model checkpoints** β€” downloaded automatically from the Hugging Face Hub model repo
`Bani57/checkpoints` on first boot. The remote layout mirrors the on-disk one, so
`huggingface_hub.snapshot_download(local_dir=CHECKPOINTS_ROOT)` drops files directly
into the expected paths:
- `src/research/COINs-KGGeneration/graph_completion/checkpoints/` (COINs: `{dataset}_{algorithm}.tar`)
- `src/research/COINs-KGGeneration/graph_completion/results/{dataset}/` (KBGAT TransE init: `transe_model.tar`)
- `src/research/COINs-KGGeneration/graph_generation/checkpoints/` (KG anomaly: `{dataset}.ckpt`, `{dataset}_correct.ckpt`)
- `src/research/MultiProxAn/checkpoints/` (graph generation: `{dataset}.ckpt`, `{dataset}_c.ckpt`)
To (re-)publish the checkpoints to the Hub from a local copy:
```bash
huggingface-cli login # one-time
python ../../scripts/upload_checkpoints.py --create
```
4. **Dataset files** β€” the raw KG data files must be present under `src/research/COINs-KGGeneration/data/` (FB15k-237, WN18RR, NELL-995).
## Running
From `src/backend/`:
```bash
# Development server
python manage.py runserver 8000
# With custom settings
DJANGO_DEBUG=True DJANGO_SECRET_KEY=my-secret python manage.py runserver
```
The API is served at `http://localhost:8000/api/v1/`.
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `DJANGO_SECRET_KEY` | `dev-insecure-key-change-in-production` | Django secret key. **Set in production.** |
| `DJANGO_DEBUG` | `True` | Enable debug mode. Set to `False` in production. |
| `DJANGO_ALLOWED_HOSTS` | `localhost,127.0.0.1` | Comma-separated allowed hosts. |
| `CORS_ALLOWED_ORIGINS` | `https://bani57-website.hf.space` | Comma-separated allowed CORS origins. |
| `TORCH_DEVICE` | Auto (`cuda:0` if available, else `cpu`) | PyTorch device for model inference. |
| `RESEARCH_ROOT` | `<repo>/src/research` (dev), `/app/research` (image) | Where the research-code modules live. |
| `CHECKPOINTS_ROOT` | Same as `RESEARCH_ROOT` | Where `huggingface_hub` deposits weights. Override to e.g. `/data/checkpoints` on a paid HF Space with persistent storage. |
| `HF_CHECKPOINTS_REPO` | `Bani57/checkpoints` | HF Hub model repo holding all weights. |
| `HF_TOKEN` | unset | Recommended. Read-scope token lifts anonymous rate limits and roughly triples cold-start download throughput. Required if the repo is private. Empty values are unset by `entrypoint.sh` to avoid a malformed `Bearer ` header. |
| `HF_HUB_ENABLE_HF_TRANSFER` | `1` (image), unset (dev) | Enables the Rust-accelerated `hf_transfer` backend for `snapshot_download`. |
| `SPA_DIST_DIR` | `<backend>/dist` | Folder containing `index.html` from `npm run build`. WhiteNoise serves assets from here. |
## Startup Sequence
In the deployment container the entrypoint script pre-warms the checkpoint download
from the Hugging Face Hub *before* gunicorn starts, so workers never block on the
network. Then on Django boot (`ApiConfig.ready()`), the `ModelRegistry` initializes:
1. **Verify / download checkpoints** from `Bani57/checkpoints` on HF Hub if any expected
subdir is missing. Idempotent β€” a no-op when the entrypoint already populated the tree
or when running locally with weights on disk.
2. **Scan checkpoint directories** to detect available models per method
3. **Load lightweight COINs Loaders** β€” one per dataset (freebase, wordnet, nell), loading graph data, name maps, and train/val/test splits. Heavy arrays (node neighbours ~275MB each, community neighbours, adjacency dicts) are freed after initialization to keep memory low.
4. **Generate sample subgraphs** for KG anomaly using the COINs Loaders
All model weights (COINs inference, graph generation, KG anomaly) are loaded lazily at first inference request.
## Deployment
The site is packaged as a single Docker image and deployed to a Hugging Face Space
(`Bani57/website` -> <https://bani57-website.hf.space>). The image:
- builds the Vue SPA with `npm run build` in a Node 20 stage,
- assembles a `mambaorg/micromamba` runtime mirroring the local `website_c` env from
`environment.yml` + `requirements.txt` (GPU torch wheels, `cu118`),
- copies the SPA `dist/` next to Django so WhiteNoise serves it on the same origin as
`/api/v1/`,
- runs `entrypoint.sh`, which `snapshot_download`s checkpoints from
`Bani57/checkpoints` on HF Hub into `/app/checkpoints` and execs `gunicorn` on `0.0.0.0:7860`.
Local reproduction:
```bash
docker compose up --build
# -> http://localhost:7860
```
Push to the Space (one-time remote setup):
```bash
git remote add hf https://huggingface.co/spaces/Bani57/website
git push hf master:main
```
## API Endpoints
All endpoints are prefixed with `/api/v1/`.
### Health & Discovery
| Method | Path | Description |
|---|---|---|
| `GET` | `/health` | Service health + model availability + inference lock status |
| `GET` | `/methods` | List the 3 research methods |
| `POST` | `/debug/force-unlock` | Release stuck inference lock (debug mode only) |
### COINs β€” KG Reasoning
| Method | Path | Description |
|---|---|---|
| `GET` | `/coins/datasets` | List datasets with entity/relation counts |
| `GET` | `/coins/datasets/{id}/entities` | Paginated entity search (`?q=&page=&page_size=`) |
| `GET` | `/coins/datasets/{id}/relations` | Paginated relation search (`?q=&page=&page_size=`) |
| `GET` | `/coins/datasets/{id}/sample-triples` | Random training triples (`?count=10&seed=...`); optional `seed` makes sampling deterministic (same `seed+count` β‡’ same triples, e.g. seed by ISO date for a day-stable widget). Head/relation/tail each carry a dataset-cleaned `label` alongside `id`, `name` |
| `GET` | `/coins/datasets/{id}/sample-query` | Sample a structurally valid KG query (`?query_structure=2i&count=1&seed=...`). Walks the training graph to produce real paths/intersections. Returns `{anchors, relations, target}` keyed by node/edge IDs from `/coins/query-structures`. Preferred over `sample-triples` for multi-hop/intersection prefills |
| `GET` | `/coins/models` | Available algorithms + supported query structures |
| `GET` | `/coins/query-structures` | Query graph templates for frontend rendering |
| `POST` | `/coins/predict` | Run link prediction / query answering |
### Graph Generation β€” MultiProxAn
| Method | Path | Description |
|---|---|---|
| `GET` | `/graph-generation/datasets` | List graph types with node/edge types |
| `GET` | `/graph-generation/sampling-modes` | Sampling strategies with parameter specs |
| `POST` | `/graph-generation/generate` | **Streaming SSE.** Generate a graph (standard denoising or MultiProx Gibbs init) |
| `POST` | `/graph-generation/continue` | **Streaming SSE.** Advance a MultiProx Gibbs session by one step |
### KG Anomaly Correction
| Method | Path | Description |
|---|---|---|
| `GET` | `/kg-anomaly/datasets` | List datasets with correction models |
| `GET` | `/kg-anomaly/datasets/{id}/sample-subgraphs` | Pre-computed example subgraphs (`?count=5&noise_level=0.4&task=correct&seed=42`); noise is task-aware |
| `POST` | `/kg-anomaly/correct` | **Streaming SSE.** Correct/regenerate a KG subgraph (standard denoising or MultiProx Gibbs init) |
| `POST` | `/kg-anomaly/continue` | **Streaming SSE.** Advance a MultiProx correction session by one step |
## Streaming Inference Protocol (SSE)
The graph generation endpoints (`/generate`, `/continue`) return **Server-Sent Events** (`text/event-stream`). Three event types are emitted:
**`event: progress`** β€” phase/step metadata (no images):
```
event: progress
data: {"type":"progress","phase":"denoise","step":42,"total_steps":500,"elapsed_ms":2100}
```
KG-anomaly progress events additionally carry an optional `kg_log_likelihood`
(float) + `kg_log_likelihood_step` (int) on frame boundaries β€” the mean
log-sigmoid score from the frozen KG embedder + link ranker on the edges
currently present in the argmax reconstruction. Higher = cleaner.
**`event: preview`** β€” base64 PNG of the graph's current state, emitted at key frames:
```
event: preview
data: data:image/png;base64,...
```
Preview frequency: `denoise` emits at `chain_frames` intervals (~30 over 500 steps), `gibbs` emits every inner step, `refine` emits every ~10% of steps.
**`event: result`** β€” final payload with image, chain GIF, and timing:
```
event: result
data: {"type":"result","dataset_id":"qm9","model_type":"discrete","sampling_mode":"standard","image":"data:image/png;base64,...","chain_gif":"data:image/gif;base64,...","inference_time_ms":25000}
```
Phases: `denoise` (standard generation loop), `noise_init` (multiprox init noise sampling), `gibbs` (multiprox inner Gibbs steps), `refine` (multiprox refinement denoising).
## Project Structure
```
src/backend/
manage.py
requirements.txt
research_api/ # Django project settings
settings.py
urls.py
wsgi.py
api/ # Django app
apps.py # Triggers ModelRegistry.initialize() on startup
urls.py # Route definitions
pagination.py # Shared pagination helper
exceptions.py # Custom error envelope
services/
constants.py # Dataset metadata, model configs, query structures
registry.py # ModelRegistry β€” checkpoint download, scanning, Loader init
views/
health.py # /health, /methods
coins.py # /coins/* endpoints
graph_generation.py # /graph-generation/* endpoints
kg_anomaly.py # /kg-anomaly/* endpoints
```
## Testing with Postman
Import the collection and environment from `docs/postman/` to test all discovery endpoints.