website / docs /reference /api.md
Andrej Janchevski
docs: add technical documentation set
175b650
# REST API reference
All endpoints are mounted at `/api/v1/`. Request and response bodies are JSON unless noted otherwise. Streaming endpoints emit `text/event-stream` per [reference/sse-protocol.md](sse-protocol.md). The OpenAPI 3.0.3 source of truth is [`docs/api.yaml`](../api.yaml); this document is the human-readable summary.
For end-to-end examples in a runnable form, import the Postman collection at [`docs/postman/`](../postman/).
## Conventions
- Pagination: list endpoints accept `?page=` (1-indexed) and `?page_size=` (default 50). Responses include `total`.
- Errors: every error has the shape `{ "error": { "code": "...", "message": "...", "details": {} } }`. Codes are listed below.
- Inference serialization: only one inference runs at a time across the whole process. Concurrent requests return HTTP 429 / `INFERENCE_BUSY` (see [Inference lock](../glossary.md#inference-lock)).
### Error codes
| HTTP | `code` | When |
|---|---|---|
| 400 | `INVALID_REQUEST` | Missing or malformed parameters, unsupported `query_structure` for the algorithm, etc. |
| 404 | `NOT_FOUND` | Unknown dataset id, missing entity, etc. |
| 422 | `INFERENCE_ERROR` | Inference ran but produced an unrecoverable error. |
| 429 | `INFERENCE_BUSY` | Another inference is in progress. |
| 503 | `MODEL_UNAVAILABLE` | Checkpoint file missing for the requested `(dataset, algorithm/task/model_type)` combination. |
## Health and discovery
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/` | API root with absolute URLs to every section. |
| `GET` | `/health` | Service status, which model groups are loaded, current inference-lock holder. |
| `GET` | `/methods` | The three research methods with their thesis sections. |
| `POST` | `/debug/force-unlock` | Release a stuck inference lock. **Returns 403 unless `DEBUG=True`.** |
## COINs β€” KG reasoning
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/coins/datasets` | List datasets with entity / relation counts. |
| `GET` | `/coins/datasets/{id}/entities` | Paginated entity search (`?q=`, `?page=`, `?page_size=`). |
| `GET` | `/coins/datasets/{id}/relations` | Paginated relation search. |
| `GET` | `/coins/datasets/{id}/sample-triples` | Random training triples. `?count=10`, optional `?seed=` for determinism. |
| `GET` | `/coins/datasets/{id}/sample-query` | Sample a structurally-valid [query](../glossary.md#query-structure). `?query_structure=` is required (`1p`, `2p`, `3p`, `2i`, `3i`, `ip`, `pi`); `?count`, `?seed` optional. Returns `{anchors, relations, target}` keyed by frontend slot ids (`a`/`a1`/`a2`, `r1`/`r2`/`r3`, `v1`/`v2`). |
| `GET` | `/coins/models` | Available algorithms per dataset, plus the query structures each supports. |
| `GET` | `/coins/query-structures` | Frontend rendering templates for query graphs (anchor/variable/relation slots, edge connectivity). |
| `POST` | `/coins/predict` | Run link prediction or query answering (synchronous JSON response). |
`POST /coins/predict` body:
```json
{
"dataset_id": "freebase",
"algorithm": "transe",
"query_structure": "1p",
"anchors": { "a": 42 },
"variables": {},
"relations": { "r1": 7 },
"top_k": 10
}
```
Response (truncated):
```json
{
"predictions": [
{ "entity_id": 99, "name": "...", "label": "...", "score": 12.4 }
],
"community_rank": { "ranked_community_id": 5, "rank": 2, "total": 1092 },
"timing_ms": { "total": 320, "embedder": 80, "ranker": 240 }
}
```
The `community_rank` block reports where the chosen target community sits in the global ranking β€” useful for showing the COINs locality benefit.
## Graph generation β€” MultiProxAn
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/graph-generation/datasets` | List graph types with node and edge type counts. |
| `GET` | `/graph-generation/sampling-modes` | The two [sampling modes](../glossary.md#sampling-mode) with their parameter specs. |
| `POST` | `/graph-generation/generate` | **Streaming SSE.** Generate a graph (`standard` or `multiprox` Gibbs init). |
| `POST` | `/graph-generation/continue` | **Streaming SSE.** Advance a `multiprox` Gibbs session by one step using the returned [state blob](../glossary.md#continuation-token--state-blob). |
`POST /graph-generation/generate` body (standard mode):
```json
{
"dataset_id": "qm9",
"model_type": "discrete",
"sampling_mode": "standard",
"num_nodes": 19,
"diffusion_steps": 500,
"chain_frames": 30
}
```
`multiprox` adds:
```json
{
"multiprox_params": {
"n": 4, "m": 8, "t": 500, "t_prime": 250, "gibbs_chain_freq": 1
}
}
```
`POST /graph-generation/continue` body:
```json
{ "state": "<base64 state blob from a previous /generate or /continue result>" }
```
## KG anomaly correction
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/kg-anomaly/datasets` | List datasets with their `correct` and `generate` checkpoints. |
| `GET` | `/kg-anomaly/datasets/{id}/sample-subgraphs` | Pre-computed [subgraphs](../glossary.md#subgraph). `?count=5`, `?noise_level=0.4`, `?task=correct\|generate`, `?seed=42`. |
| `POST` | `/kg-anomaly/correct` | **Streaming SSE.** Correct or regenerate a KG subgraph. |
| `POST` | `/kg-anomaly/continue` | **Streaming SSE.** Advance a multiprox correction session by one step. |
`POST /kg-anomaly/correct` body:
```json
{
"dataset_id": "freebase",
"task": "correct",
"sampling_mode": "standard",
"subgraph": {
"nodes": [{ "entity_id": 42, "type_id": 0 }, ...],
"edges": [{ "source_idx": 0, "target_idx": 1, "relation_id": 5 }, ...],
"is_bip": false,
"row_size": 6
},
"diffusion_steps": 500,
"chain_frames": 30
}
```
The `correct`-task SSE stream additionally carries `kg_log_likelihood` on each `progress` event β€” see [reference/sse-protocol.md](sse-protocol.md).
## See also
- [reference/sse-protocol.md](sse-protocol.md) β€” exact wire format of the streaming events.
- [reference/backend-services.md](backend-services.md) β€” what each Python module behind these endpoints does.
- [explanation/research-methods.md](../explanation/research-methods.md) β€” what these endpoints actually compute, scientifically.