| # Backend services |
|
|
| Module-by-module reference for `src/backend/api/`. The Django app is named `api`; the project (`research_api`) is the WSGI entry point. Consult [explanation/architecture.md](../explanation/architecture.md) for how these modules fit together. |
|
|
| ## `research_api/` β Django project |
| |
| | File | Role | |
| |---|---| |
| | `settings.py` | All configuration, env-var driven. Adds `src/research/*` to `sys.path` so the research code imports cleanly. Configures WhiteNoise, CORS, DRF, security middleware, paths under `RESEARCH_ROOT` / `CHECKPOINTS_ROOT`. | |
| | `urls.py` | Root URL config. Mounts `/api/v1/` and a non-API SPA catch-all that returns `dist/index.html`. | |
| | `wsgi.py` | Standard `get_wsgi_application()` entry point. Used by gunicorn. | |
| |
| ## `api/` β Django app |
| |
| ### `apps.py` |
| `ApiConfig.ready()` runs once at boot. Two skip-checks before calling `ModelRegistry.initialize()`: |
| |
| - `sys.argv[1]` against `_SKIP_REGISTRY_INIT` (`collectstatic`, `migrate`, `makemigrations`, `check`, `shell`, `showmigrations`, `diffsettings`, `test`, `compilemessages`, `makemessages`). Stops `python manage.py collectstatic --noinput` from triggering a multi-GB checkpoint download into a throwaway image layer. |
| - The outer `runserver` reloader process (`RUN_MAIN != "true"`). Stops dev mode from doing the heavy boot twice. |
|
|
| ### `urls.py` |
| Maps every endpoint listed in [reference/api.md](api.md) to the matching view class. |
|
|
| ### `exceptions.py` |
| The error envelope. All exceptions raised inside views inherit from `ApiError`, which has a `code` and a `details` dict. `api_exception_handler` wraps every error in `{"error": {"code": ..., "message": ..., "details": ...}}`. Subclasses: |
|
|
| | Class | HTTP | `code` | |
| |---|---|---| |
| | `NotFoundError` | 404 | `NOT_FOUND` | |
| | `InvalidRequestError` | 400 | `INVALID_REQUEST` | |
| | `InferenceError` | 422 | `INFERENCE_ERROR` | |
| | `InferenceBusy` | 429 | `INFERENCE_BUSY` | |
| | `ModelUnavailable` | 503 | `MODEL_UNAVAILABLE` | |
|
|
| ### `pagination.py` |
| Tiny helper for the entity / relation list endpoints (1-indexed `page`, default `page_size=50`). |
|
|
| ### `renderers.py` |
| `EventStreamRenderer` declares `text/event-stream` so DRF content negotiation accepts SSE clients. Streaming views return a `StreamingHttpResponse` directly, so the renderer's `render()` is never invoked β this class exists only to satisfy DRF's accept-header machinery. |
|
|
| ### `utils.py` |
| String cleanup helpers. `clean_entity_name` and `clean_relation_name` strip dataset-specific prefixes (`/m/...` for Freebase, namespace prefixes for NELL, etc.) so the UI shows readable labels. |
|
|
| ## `api/views/` β endpoint handlers |
|
|
| | File | Endpoints | Notes | |
| |---|---|---| |
| | `health.py` | `/`, `/health`, `/methods`, `/debug/force-unlock` | Trivial views; the only one that touches the registry is `HealthView`. | |
| | `coins.py` | `/coins/*` | Discovery views read directly from the registry's pre-built dictionaries. `CoinsPredictView` calls `ModelRegistry.coins_predict`, which acquires the inference lock. | |
| | `graph_generation.py` | `/graph-generation/*` | `GraphGenGenerateView` and `GraphGenContinueView` return `StreamingHttpResponse(generator)` where the generator yields SSE-formatted bytes. The lock is acquired before the generator starts and released in its `finally`. | |
| | `kg_anomaly.py` | `/kg-anomaly/*` | Same shape as graph generation. The `correct` task computes a [KG log-likelihood](../glossary.md#sampling-mode) per chain frame. | |
|
|
| Every view either: |
| - Returns a `Response` (DRF JSON), or |
| - Returns a `StreamingHttpResponse` whose generator yields `event: ...\ndata: ...\n\n` strings encoded as bytes. |
|
|
| ## `api/services/` β business logic |
|
|
| The heart of the backend. These modules import the research code under `src/research/` and host all PyTorch inference. |
|
|
| ### `constants.py` |
| Domain metadata used by the discovery endpoints: |
|
|
| - `METHODS` β the three research methods with thesis sections. |
| - `COINS_DATASET_META` β display names, descriptions, raw-data directory mapping. |
| - `COINS_MODELS` β algorithm definitions and supported `query_structure` lists. |
| - `QUERY_STRUCTURES` β frontend rendering templates (anchor / variable / relation slot positions, edge connectivity). |
| - `COINS_CONFIG_SUFFIX` β yaml-config naming convention for each algorithm. |
| - `QUERY_TREE_MAPPINGS` β research-code structure strings (e.g. `1p2i`) and slot mappings consumed by `Query.instantiate`. |
|
|
| ### `registry.py` |
| The single most important module. Owns `ModelRegistry`, the in-memory cache of everything the API needs at request time. |
|
|
| Public surface (used by views): |
|
|
| | Method | Returns | |
| |---|---| |
| | `ModelRegistry.get()` | The singleton (raises if not initialized). | |
| | `get_loader(dataset_id)` | The lightweight COINs Loader for discovery endpoints. | |
| | `get_entity_count`, `get_relation_count` | Cardinalities for `/coins/datasets`. | |
| | `get_inverted_name_maps(dataset_id)` | `(inv_node_names, inv_node_types, inv_relation_names)` Series. | |
| | `search_entities`, `search_relations` | Substring search over labels, with pagination. | |
| | `sample_triples` | Random training triples. Optional `seed` for determinism. | |
| | `sample_query` | Calls `Query.instantiate` to walk the graph and produce a structurally valid query. | |
| | `coins_predict(...)` | Acquires the lock, runs prediction, releases. | |
| | `graphgen_generate_stream(...)` | Returns a generator (lock is held by the generator). | |
| | `graphgen_continue_stream(...)` | Decodes a state blob, advances one Gibbs round. | |
| | `kg_anomaly_correct_stream(...)` | Same shape as graphgen. | |
| | `kg_anomaly_continue_stream(...)` | Same shape. | |
| | `force_release_inference_lock()` | Called by the debug endpoint. | |
| | `is_coins_loaded`, `is_graphgen_loaded`, `is_kg_anomaly_loaded` | Health-endpoint signals. | |
|
|
| Internal state: |
|
|
| - `coins_checkpoints_available`, `graphgen_checkpoints_available`, `kg_anomaly_checkpoints_available` β populated by `_scan_checkpoints`. |
| - `loaders` β `{dataset_id: lightweight Loader}` for discovery endpoints. |
| - `_coins_experiments`, `_graphgen_models`, `_kg_anomaly_models` β lazy caches keyed by request parameters. |
| - `_coins_loaders` β full Loaders shared across algorithms with the same `(dataset, seed, leiden_resolution)`. |
| - `_inference_lock` β the global single-flight gate. |
|
|
| Initialization is a four-step sequence described in [explanation/inference-lifecycle.md](../explanation/inference-lifecycle.md). |
|
|
| Checkpoint loading helpers live in the same module: |
|
|
| - `_safe_load_lightning_checkpoint` β loads a Lightning checkpoint without triggering DDP / `deepcopy` crashes. |
| - `_adapt_shape_mismatches`, `_adapt_mlp_bn_keys`, `_adapt_kbgat_state_dict` β torch-geometric 2.0.x β 2.3.x weight-format compatibility shims. |
| - `_free_heavy_arrays` β discards memory-intensive Loader fields after init. |
|
|
| `_load_coins_experiment` wraps each `experiment.prepare()` call in two monkey-patches (restored in a `finally`) β see [explanation/inference-lifecycle.md](../explanation/inference-lifecycle.md#monkey-patches-around-experimentprepare) for the rationale: |
|
|
| - `Module.share_memory` β no-op (avoids `Bus error` from PyTorch shared-memory paths under tight `/dev/shm`). |
| - `torch.load` β TransE-init dim expansion (repeats `transe_model.tar` weights along the embedding axis when YAML's `embedding_dim` is an integer multiple of the init's dim, so KBGAT's `weight.data = init` doesn't clobber the model's declared dim). |
|
|
| ### `coins_inference.py` |
| `coins_predict_inner(experiment, dataset_id, algorithm, query_structure_id, anchors, variables, relations_map, top_k)` β runs a single COINs prediction. Validates the query, builds the embedding query, scores candidate tails, returns the top-k with cleaned names and the community-rank info. |
|
|
| ### `graphgen_inference.py` |
| The MultiProxAn / DiGress sampling loop. |
| |
| - `run_standard_generation(model, num_nodes, diffusion_steps, chain_frames, dataset_id)` β single denoising chain. Yields `progress`, `preview`, `result` events. |
| - `run_multiprox_init(model, num_nodes, n, m, t, t_prime, gibbs_chain_freq, dataset_id)` β initial denoise to step `t_prime`. Returns the partial state for a `/continue` follow-up. |
| - `run_multiprox_step(model, state, dataset_id)` β one Gibbs round. |
| - `encode_state_blob` / `decode_state_blob` β base64 round-trip for the [continuation token](../glossary.md#continuation-token--state-blob). |
| - `_collapse_final` symmetrises `E` (`E = (E + E.T) / 2`) before calling `model.sample_discrete_graph_given_z0`. The model has a strict symmetry assert that's tripped by ULP-level drift from the MultiProx aggregation on some BLAS stacks. See the [MultiProx symmetry safeguard](../explanation/inference-lifecycle.md#multiprox-symmetry-safeguard) note. |
|
|
| ### `kg_anomaly_inference.py` |
| The KG-subgraph correction loop. Mirrors `graphgen_inference.py` but operates on knowledge-graph subgraphs and computes the KG log-likelihood metric per frame using the frozen COINs link ranker. |
|
|
| - `build_kg_tensors(subgraph, loader, model)` β converts the request payload into the model's input tensors. |
| - `run_standard_correction(...)` and `run_multiprox_correction_init(...)` / `run_multiprox_correction_step(...)` β analogous to graphgen. |
|
|
| ### `kg_likelihood.py` |
| Helper that scores edges with the COINs link ranker and computes the mean log-sigmoid metric the SSE protocol surfaces. |
| |
| ## See also |
| |
| - [explanation/inference-lifecycle.md](../explanation/inference-lifecycle.md) β boot, lazy load, lock. |
| - [reference/api.md](api.md) β endpoint contracts. |
| - [reference/sse-protocol.md](sse-protocol.md) β wire format the streaming services produce. |
| |