# REST API reference

All endpoints are mounted at `/api/v1/`. Request and response bodies are JSON unless noted otherwise. Streaming endpoints emit `text/event-stream` per [reference/sse-protocol.md](sse-protocol.md). The OpenAPI 3.0.3 source of truth is [`docs/api.yaml`](../api.yaml); this document is the human-readable summary.

For end-to-end examples in a runnable form, import the Postman collection at [`docs/postman/`](../postman/).

## Conventions

- Pagination: list endpoints accept `?page=` (1-indexed) and `?page_size=` (default 50). Responses include `total`.
- Errors: every error has the shape `{ "error": { "code": "...", "message": "...", "details": {} } }`. Codes are listed below.
- Inference serialization: only one inference runs at a time across the whole process. Concurrent requests return HTTP 429 / `INFERENCE_BUSY` (see [Inference lock](../glossary.md#inference-lock)).

### Error codes

| HTTP | `code` | When |
|---|---|---|
| 400 | `INVALID_REQUEST` | Missing or malformed parameters, unsupported `query_structure` for the algorithm, etc. |
| 404 | `NOT_FOUND` | Unknown dataset id, missing entity, etc. |
| 422 | `INFERENCE_ERROR` | Inference ran but produced an unrecoverable error. |
| 429 | `INFERENCE_BUSY` | Another inference is in progress. |
| 503 | `MODEL_UNAVAILABLE` | Checkpoint file missing for the requested `(dataset, algorithm/task/model_type)` combination. |

## Health and discovery

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/` | API root with absolute URLs to every section. |
| `GET` | `/health` | Service status, which model groups are loaded, current inference-lock holder. |
| `GET` | `/methods` | The three research methods with their thesis sections. |
| `POST` | `/debug/force-unlock` | Release a stuck inference lock. **Returns 403 unless `DEBUG=True`.** |

## COINs — KG reasoning

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/coins/datasets` | List datasets with entity / relation counts. |
| `GET` | `/coins/datasets/{id}/entities` | Paginated entity search (`?q=`, `?page=`, `?page_size=`). |
| `GET` | `/coins/datasets/{id}/relations` | Paginated relation search. |
| `GET` | `/coins/datasets/{id}/sample-triples` | Random training triples. `?count=10`, optional `?seed=` for determinism. |
| `GET` | `/coins/datasets/{id}/sample-query` | Sample a structurally-valid [query](../glossary.md#query-structure). `?query_structure=` is required (`1p`, `2p`, `3p`, `2i`, `3i`, `ip`, `pi`); `?count`, `?seed` optional. Returns `{anchors, relations, target}` keyed by frontend slot ids (`a`/`a1`/`a2`, `r1`/`r2`/`r3`, `v1`/`v2`). |
| `GET` | `/coins/models` | Available algorithms per dataset, plus the query structures each supports. |
| `GET` | `/coins/query-structures` | Frontend rendering templates for query graphs (anchor/variable/relation slots, edge connectivity). |
| `POST` | `/coins/predict` | Run link prediction or query answering (synchronous JSON response). |

`POST /coins/predict` body:

```json
{
  "dataset_id": "freebase",
  "algorithm": "transe",
  "query_structure": "1p",
  "anchors": { "a": 42 },
  "variables": {},
  "relations": { "r1": 7 },
  "top_k": 10
}
```

Response (truncated):

```json
{
  "predictions": [
    { "entity_id": 99, "name": "...", "label": "...", "score": 12.4 }
  ],
  "community_rank": { "ranked_community_id": 5, "rank": 2, "total": 1092 },
  "timing_ms": { "total": 320, "embedder": 80, "ranker": 240 }
}
```

The `community_rank` block reports where the chosen target community sits in the global ranking — useful for showing the COINs locality benefit.

## Graph generation — MultiProxAn

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/graph-generation/datasets` | List graph types with node and edge type counts. |
| `GET` | `/graph-generation/sampling-modes` | The two [sampling modes](../glossary.md#sampling-mode) with their parameter specs. |
| `POST` | `/graph-generation/generate` | **Streaming SSE.** Generate a graph (`standard` or `multiprox` Gibbs init). |
| `POST` | `/graph-generation/continue` | **Streaming SSE.** Advance a `multiprox` Gibbs session by one step using the returned [state blob](../glossary.md#continuation-token--state-blob). |

`POST /graph-generation/generate` body (standard mode):

```json
{
  "dataset_id": "qm9",
  "model_type": "discrete",
  "sampling_mode": "standard",
  "num_nodes": 19,
  "diffusion_steps": 500,
  "chain_frames": 30
}
```

`multiprox` adds:

```json
{
  "multiprox_params": {
    "n": 4, "m": 8, "t": 500, "t_prime": 250, "gibbs_chain_freq": 1
  }
}
```

`POST /graph-generation/continue` body:

```json
{ "state": "<base64 state blob from a previous /generate or /continue result>" }
```

## KG anomaly correction

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/kg-anomaly/datasets` | List datasets with their `correct` and `generate` checkpoints. |
| `GET` | `/kg-anomaly/datasets/{id}/sample-subgraphs` | Pre-computed [subgraphs](../glossary.md#subgraph). `?count=5`, `?noise_level=0.4`, `?task=correct\|generate`, `?seed=42`. |
| `POST` | `/kg-anomaly/correct` | **Streaming SSE.** Correct or regenerate a KG subgraph. |
| `POST` | `/kg-anomaly/continue` | **Streaming SSE.** Advance a multiprox correction session by one step. |

`POST /kg-anomaly/correct` body:

```json
{
  "dataset_id": "freebase",
  "task": "correct",
  "sampling_mode": "standard",
  "subgraph": {
    "nodes": [{ "entity_id": 42, "type_id": 0 }, ...],
    "edges": [{ "source_idx": 0, "target_idx": 1, "relation_id": 5 }, ...],
    "is_bip": false,
    "row_size": 6
  },
  "diffusion_steps": 500,
  "chain_frames": 30
}
```

The `correct`-task SSE stream additionally carries `kg_log_likelihood` on each `progress` event — see [reference/sse-protocol.md](sse-protocol.md).

## See also

- [reference/sse-protocol.md](sse-protocol.md) — exact wire format of the streaming events.
- [reference/backend-services.md](backend-services.md) — what each Python module behind these endpoints does.
- [explanation/research-methods.md](../explanation/research-methods.md) — what these endpoints actually compute, scientifically.