Spaces:

Bani57
/

website

Running

App Files Files Community

website / docs /reference /backend-services.md

Andrej Janchevski

docs(deploy): refresh for the post-launch deployment iteration

5ed6f37 7 days ago

preview code

raw

history blame contribute delete

9.6 kB

Backend services

Module-by-module reference for src/backend/api/. The Django app is named api; the project (research_api) is the WSGI entry point. Consult explanation/architecture.md for how these modules fit together.

`research_api/` — Django project

File	Role
`settings.py`	All configuration, env-var driven. Adds `src/research/*` to `sys.path` so the research code imports cleanly. Configures WhiteNoise, CORS, DRF, security middleware, paths under `RESEARCH_ROOT` / `CHECKPOINTS_ROOT`.
`urls.py`	Root URL config. Mounts `/api/v1/` and a non-API SPA catch-all that returns `dist/index.html`.
`wsgi.py`	Standard `get_wsgi_application()` entry point. Used by gunicorn.

`api/` — Django app

`apps.py`

ApiConfig.ready() runs once at boot. Two skip-checks before calling ModelRegistry.initialize():

sys.argv[1] against _SKIP_REGISTRY_INIT (collectstatic, migrate, makemigrations, check, shell, showmigrations, diffsettings, test, compilemessages, makemessages). Stops python manage.py collectstatic --noinput from triggering a multi-GB checkpoint download into a throwaway image layer.
The outer runserver reloader process (RUN_MAIN != "true"). Stops dev mode from doing the heavy boot twice.

`urls.py`

Maps every endpoint listed in reference/api.md to the matching view class.

`exceptions.py`

The error envelope. All exceptions raised inside views inherit from ApiError, which has a code and a details dict. api_exception_handler wraps every error in {"error": {"code": ..., "message": ..., "details": ...}}. Subclasses:

Class	HTTP	`code`
`NotFoundError`	404	`NOT_FOUND`
`InvalidRequestError`	400	`INVALID_REQUEST`
`InferenceError`	422	`INFERENCE_ERROR`
`InferenceBusy`	429	`INFERENCE_BUSY`
`ModelUnavailable`	503	`MODEL_UNAVAILABLE`

`pagination.py`

Tiny helper for the entity / relation list endpoints (1-indexed page, default page_size=50).

`renderers.py`

EventStreamRenderer declares text/event-stream so DRF content negotiation accepts SSE clients. Streaming views return a StreamingHttpResponse directly, so the renderer's render() is never invoked — this class exists only to satisfy DRF's accept-header machinery.

`utils.py`

String cleanup helpers. clean_entity_name and clean_relation_name strip dataset-specific prefixes (/m/... for Freebase, namespace prefixes for NELL, etc.) so the UI shows readable labels.

`api/views/` — endpoint handlers

File	Endpoints	Notes
`health.py`	`/`, `/health`, `/methods`, `/debug/force-unlock`	Trivial views; the only one that touches the registry is `HealthView`.
`coins.py`	`/coins/*`	Discovery views read directly from the registry's pre-built dictionaries. `CoinsPredictView` calls `ModelRegistry.coins_predict`, which acquires the inference lock.
`graph_generation.py`	`/graph-generation/*`	`GraphGenGenerateView` and `GraphGenContinueView` return `StreamingHttpResponse(generator)` where the generator yields SSE-formatted bytes. The lock is acquired before the generator starts and released in its `finally`.
`kg_anomaly.py`	`/kg-anomaly/*`	Same shape as graph generation. The `correct` task computes a KG log-likelihood per chain frame.

Every view either:

Returns a Response (DRF JSON), or
Returns a StreamingHttpResponse whose generator yields event: ...\ndata: ...\n\n strings encoded as bytes.

`api/services/` — business logic

The heart of the backend. These modules import the research code under src/research/ and host all PyTorch inference.

`constants.py`

Domain metadata used by the discovery endpoints:

METHODS — the three research methods with thesis sections.
COINS_DATASET_META — display names, descriptions, raw-data directory mapping.
COINS_MODELS — algorithm definitions and supported query_structure lists.
QUERY_STRUCTURES — frontend rendering templates (anchor / variable / relation slot positions, edge connectivity).
COINS_CONFIG_SUFFIX — yaml-config naming convention for each algorithm.
QUERY_TREE_MAPPINGS — research-code structure strings (e.g. 1p2i) and slot mappings consumed by Query.instantiate.

`registry.py`

The single most important module. Owns ModelRegistry, the in-memory cache of everything the API needs at request time.

Public surface (used by views):

Method	Returns
`ModelRegistry.get()`	The singleton (raises if not initialized).
`get_loader(dataset_id)`	The lightweight COINs Loader for discovery endpoints.
`get_entity_count`, `get_relation_count`	Cardinalities for `/coins/datasets`.
`get_inverted_name_maps(dataset_id)`	`(inv_node_names, inv_node_types, inv_relation_names)` Series.
`search_entities`, `search_relations`	Substring search over labels, with pagination.
`sample_triples`	Random training triples. Optional `seed` for determinism.
`sample_query`	Calls `Query.instantiate` to walk the graph and produce a structurally valid query.
`coins_predict(...)`	Acquires the lock, runs prediction, releases.
`graphgen_generate_stream(...)`	Returns a generator (lock is held by the generator).
`graphgen_continue_stream(...)`	Decodes a state blob, advances one Gibbs round.
`kg_anomaly_correct_stream(...)`	Same shape as graphgen.
`kg_anomaly_continue_stream(...)`	Same shape.
`force_release_inference_lock()`	Called by the debug endpoint.
`is_coins_loaded`, `is_graphgen_loaded`, `is_kg_anomaly_loaded`	Health-endpoint signals.

Internal state:

coins_checkpoints_available, graphgen_checkpoints_available, kg_anomaly_checkpoints_available — populated by _scan_checkpoints.
loaders — {dataset_id: lightweight Loader} for discovery endpoints.
_coins_experiments, _graphgen_models, _kg_anomaly_models — lazy caches keyed by request parameters.
_coins_loaders — full Loaders shared across algorithms with the same (dataset, seed, leiden_resolution).
_inference_lock — the global single-flight gate.

Initialization is a four-step sequence described in explanation/inference-lifecycle.md.

Checkpoint loading helpers live in the same module:

_safe_load_lightning_checkpoint — loads a Lightning checkpoint without triggering DDP / deepcopy crashes.
_adapt_shape_mismatches, _adapt_mlp_bn_keys, _adapt_kbgat_state_dict — torch-geometric 2.0.x → 2.3.x weight-format compatibility shims.
_free_heavy_arrays — discards memory-intensive Loader fields after init.

_load_coins_experiment wraps each experiment.prepare() call in two monkey-patches (restored in a finally) — see explanation/inference-lifecycle.md for the rationale:

Module.share_memory → no-op (avoids Bus error from PyTorch shared-memory paths under tight /dev/shm).
torch.load → TransE-init dim expansion (repeats transe_model.tar weights along the embedding axis when YAML's embedding_dim is an integer multiple of the init's dim, so KBGAT's weight.data = init doesn't clobber the model's declared dim).

`coins_inference.py`

coins_predict_inner(experiment, dataset_id, algorithm, query_structure_id, anchors, variables, relations_map, top_k) — runs a single COINs prediction. Validates the query, builds the embedding query, scores candidate tails, returns the top-k with cleaned names and the community-rank info.

`graphgen_inference.py`

The MultiProxAn / DiGress sampling loop.

run_standard_generation(model, num_nodes, diffusion_steps, chain_frames, dataset_id) — single denoising chain. Yields progress, preview, result events.
run_multiprox_init(model, num_nodes, n, m, t, t_prime, gibbs_chain_freq, dataset_id) — initial denoise to step t_prime. Returns the partial state for a /continue follow-up.
run_multiprox_step(model, state, dataset_id) — one Gibbs round.
encode_state_blob / decode_state_blob — base64 round-trip for the continuation token.
_collapse_final symmetrises E (E = (E + E.T) / 2) before calling model.sample_discrete_graph_given_z0. The model has a strict symmetry assert that's tripped by ULP-level drift from the MultiProx aggregation on some BLAS stacks. See the MultiProx symmetry safeguard note.

`kg_anomaly_inference.py`

The KG-subgraph correction loop. Mirrors graphgen_inference.py but operates on knowledge-graph subgraphs and computes the KG log-likelihood metric per frame using the frozen COINs link ranker.

build_kg_tensors(subgraph, loader, model) — converts the request payload into the model's input tensors.
run_standard_correction(...) and run_multiprox_correction_init(...) / run_multiprox_correction_step(...) — analogous to graphgen.

`kg_likelihood.py`

Helper that scores edges with the COINs link ranker and computes the mean log-sigmoid metric the SSE protocol surfaces.

Backend services

research_api/ — Django project

api/ — Django app

apps.py

urls.py

exceptions.py

pagination.py

renderers.py

utils.py

api/views/ — endpoint handlers

api/services/ — business logic

constants.py

registry.py

coins_inference.py

graphgen_inference.py

kg_anomaly_inference.py

kg_likelihood.py

See also