File size: 9,599 Bytes
175b650
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ed6f37
 
 
 
175b650
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ed6f37
 
 
 
 
175b650
 
 
 
 
 
 
 
 
 
5ed6f37
175b650
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# Backend services

Module-by-module reference for `src/backend/api/`. The Django app is named `api`; the project (`research_api`) is the WSGI entry point. Consult [explanation/architecture.md](../explanation/architecture.md) for how these modules fit together.

## `research_api/` β€” Django project

| File | Role |
|---|---|
| `settings.py` | All configuration, env-var driven. Adds `src/research/*` to `sys.path` so the research code imports cleanly. Configures WhiteNoise, CORS, DRF, security middleware, paths under `RESEARCH_ROOT` / `CHECKPOINTS_ROOT`. |
| `urls.py` | Root URL config. Mounts `/api/v1/` and a non-API SPA catch-all that returns `dist/index.html`. |
| `wsgi.py` | Standard `get_wsgi_application()` entry point. Used by gunicorn. |

## `api/` β€” Django app

### `apps.py`
`ApiConfig.ready()` runs once at boot. Two skip-checks before calling `ModelRegistry.initialize()`:

- `sys.argv[1]` against `_SKIP_REGISTRY_INIT` (`collectstatic`, `migrate`, `makemigrations`, `check`, `shell`, `showmigrations`, `diffsettings`, `test`, `compilemessages`, `makemessages`). Stops `python manage.py collectstatic --noinput` from triggering a multi-GB checkpoint download into a throwaway image layer.
- The outer `runserver` reloader process (`RUN_MAIN != "true"`). Stops dev mode from doing the heavy boot twice.

### `urls.py`
Maps every endpoint listed in [reference/api.md](api.md) to the matching view class.

### `exceptions.py`
The error envelope. All exceptions raised inside views inherit from `ApiError`, which has a `code` and a `details` dict. `api_exception_handler` wraps every error in `{"error": {"code": ..., "message": ..., "details": ...}}`. Subclasses:

| Class | HTTP | `code` |
|---|---|---|
| `NotFoundError` | 404 | `NOT_FOUND` |
| `InvalidRequestError` | 400 | `INVALID_REQUEST` |
| `InferenceError` | 422 | `INFERENCE_ERROR` |
| `InferenceBusy` | 429 | `INFERENCE_BUSY` |
| `ModelUnavailable` | 503 | `MODEL_UNAVAILABLE` |

### `pagination.py`
Tiny helper for the entity / relation list endpoints (1-indexed `page`, default `page_size=50`).

### `renderers.py`
`EventStreamRenderer` declares `text/event-stream` so DRF content negotiation accepts SSE clients. Streaming views return a `StreamingHttpResponse` directly, so the renderer's `render()` is never invoked β€” this class exists only to satisfy DRF's accept-header machinery.

### `utils.py`
String cleanup helpers. `clean_entity_name` and `clean_relation_name` strip dataset-specific prefixes (`/m/...` for Freebase, namespace prefixes for NELL, etc.) so the UI shows readable labels.

## `api/views/` β€” endpoint handlers

| File | Endpoints | Notes |
|---|---|---|
| `health.py` | `/`, `/health`, `/methods`, `/debug/force-unlock` | Trivial views; the only one that touches the registry is `HealthView`. |
| `coins.py` | `/coins/*` | Discovery views read directly from the registry's pre-built dictionaries. `CoinsPredictView` calls `ModelRegistry.coins_predict`, which acquires the inference lock. |
| `graph_generation.py` | `/graph-generation/*` | `GraphGenGenerateView` and `GraphGenContinueView` return `StreamingHttpResponse(generator)` where the generator yields SSE-formatted bytes. The lock is acquired before the generator starts and released in its `finally`. |
| `kg_anomaly.py` | `/kg-anomaly/*` | Same shape as graph generation. The `correct` task computes a [KG log-likelihood](../glossary.md#sampling-mode) per chain frame. |

Every view either:
- Returns a `Response` (DRF JSON), or
- Returns a `StreamingHttpResponse` whose generator yields `event: ...\ndata: ...\n\n` strings encoded as bytes.

## `api/services/` β€” business logic

The heart of the backend. These modules import the research code under `src/research/` and host all PyTorch inference.

### `constants.py`
Domain metadata used by the discovery endpoints:

- `METHODS` β€” the three research methods with thesis sections.
- `COINS_DATASET_META` β€” display names, descriptions, raw-data directory mapping.
- `COINS_MODELS` β€” algorithm definitions and supported `query_structure` lists.
- `QUERY_STRUCTURES` β€” frontend rendering templates (anchor / variable / relation slot positions, edge connectivity).
- `COINS_CONFIG_SUFFIX` β€” yaml-config naming convention for each algorithm.
- `QUERY_TREE_MAPPINGS` β€” research-code structure strings (e.g. `1p2i`) and slot mappings consumed by `Query.instantiate`.

### `registry.py`
The single most important module. Owns `ModelRegistry`, the in-memory cache of everything the API needs at request time.

Public surface (used by views):

| Method | Returns |
|---|---|
| `ModelRegistry.get()` | The singleton (raises if not initialized). |
| `get_loader(dataset_id)` | The lightweight COINs Loader for discovery endpoints. |
| `get_entity_count`, `get_relation_count` | Cardinalities for `/coins/datasets`. |
| `get_inverted_name_maps(dataset_id)` | `(inv_node_names, inv_node_types, inv_relation_names)` Series. |
| `search_entities`, `search_relations` | Substring search over labels, with pagination. |
| `sample_triples` | Random training triples. Optional `seed` for determinism. |
| `sample_query` | Calls `Query.instantiate` to walk the graph and produce a structurally valid query. |
| `coins_predict(...)` | Acquires the lock, runs prediction, releases. |
| `graphgen_generate_stream(...)` | Returns a generator (lock is held by the generator). |
| `graphgen_continue_stream(...)` | Decodes a state blob, advances one Gibbs round. |
| `kg_anomaly_correct_stream(...)` | Same shape as graphgen. |
| `kg_anomaly_continue_stream(...)` | Same shape. |
| `force_release_inference_lock()` | Called by the debug endpoint. |
| `is_coins_loaded`, `is_graphgen_loaded`, `is_kg_anomaly_loaded` | Health-endpoint signals. |

Internal state:

- `coins_checkpoints_available`, `graphgen_checkpoints_available`, `kg_anomaly_checkpoints_available` β€” populated by `_scan_checkpoints`.
- `loaders` β€” `{dataset_id: lightweight Loader}` for discovery endpoints.
- `_coins_experiments`, `_graphgen_models`, `_kg_anomaly_models` β€” lazy caches keyed by request parameters.
- `_coins_loaders` β€” full Loaders shared across algorithms with the same `(dataset, seed, leiden_resolution)`.
- `_inference_lock` β€” the global single-flight gate.

Initialization is a four-step sequence described in [explanation/inference-lifecycle.md](../explanation/inference-lifecycle.md).

Checkpoint loading helpers live in the same module:

- `_safe_load_lightning_checkpoint` β€” loads a Lightning checkpoint without triggering DDP / `deepcopy` crashes.
- `_adapt_shape_mismatches`, `_adapt_mlp_bn_keys`, `_adapt_kbgat_state_dict` β€” torch-geometric 2.0.x β†’ 2.3.x weight-format compatibility shims.
- `_free_heavy_arrays` β€” discards memory-intensive Loader fields after init.

`_load_coins_experiment` wraps each `experiment.prepare()` call in two monkey-patches (restored in a `finally`) β€” see [explanation/inference-lifecycle.md](../explanation/inference-lifecycle.md#monkey-patches-around-experimentprepare) for the rationale:

- `Module.share_memory` β†’ no-op (avoids `Bus error` from PyTorch shared-memory paths under tight `/dev/shm`).
- `torch.load` β†’ TransE-init dim expansion (repeats `transe_model.tar` weights along the embedding axis when YAML's `embedding_dim` is an integer multiple of the init's dim, so KBGAT's `weight.data = init` doesn't clobber the model's declared dim).

### `coins_inference.py`
`coins_predict_inner(experiment, dataset_id, algorithm, query_structure_id, anchors, variables, relations_map, top_k)` β€” runs a single COINs prediction. Validates the query, builds the embedding query, scores candidate tails, returns the top-k with cleaned names and the community-rank info.

### `graphgen_inference.py`
The MultiProxAn / DiGress sampling loop.

- `run_standard_generation(model, num_nodes, diffusion_steps, chain_frames, dataset_id)` β€” single denoising chain. Yields `progress`, `preview`, `result` events.
- `run_multiprox_init(model, num_nodes, n, m, t, t_prime, gibbs_chain_freq, dataset_id)` β€” initial denoise to step `t_prime`. Returns the partial state for a `/continue` follow-up.
- `run_multiprox_step(model, state, dataset_id)` β€” one Gibbs round.
- `encode_state_blob` / `decode_state_blob` β€” base64 round-trip for the [continuation token](../glossary.md#continuation-token--state-blob).
- `_collapse_final` symmetrises `E` (`E = (E + E.T) / 2`) before calling `model.sample_discrete_graph_given_z0`. The model has a strict symmetry assert that's tripped by ULP-level drift from the MultiProx aggregation on some BLAS stacks. See the [MultiProx symmetry safeguard](../explanation/inference-lifecycle.md#multiprox-symmetry-safeguard) note.

### `kg_anomaly_inference.py`
The KG-subgraph correction loop. Mirrors `graphgen_inference.py` but operates on knowledge-graph subgraphs and computes the KG log-likelihood metric per frame using the frozen COINs link ranker.

- `build_kg_tensors(subgraph, loader, model)` β€” converts the request payload into the model's input tensors.
- `run_standard_correction(...)` and `run_multiprox_correction_init(...)` / `run_multiprox_correction_step(...)` β€” analogous to graphgen.

### `kg_likelihood.py`
Helper that scores edges with the COINs link ranker and computes the mean log-sigmoid metric the SSE protocol surfaces.

## See also

- [explanation/inference-lifecycle.md](../explanation/inference-lifecycle.md) β€” boot, lazy load, lock.
- [reference/api.md](api.md) β€” endpoint contracts.
- [reference/sse-protocol.md](sse-protocol.md) β€” wire format the streaming services produce.