| # Glossary |
|
|
| Domain terms used throughout the codebase and documentation. Other documents reference this file rather than redefining terms in place. |
|
|
| ## Knowledge graph reasoning |
|
|
| ### Knowledge graph (KG) |
| A directed multigraph of `(head, relation, tail)` triples where vertices are entities and labelled edges are relations. The three KGs exposed by the site are FB15k-237 (Freebase subset), WN18RR (WordNet subset) and NELL-995. |
|
|
| ### Link prediction |
| Given two of `head`, `relation`, `tail`, score and rank candidates for the missing slot. The 1-projection (`1p`) query structure is link prediction. |
|
|
| ### Query structure |
| A multi-hop / intersection / projection query template over a KG. Supported structures: `1p`, `2p`, `3p` (single chain projections), `2i`, `3i` (intersection of two/three relations), `ip` (intersection then projection), `pi` (projection then intersection). Templates determine which slots β anchor entities (`a`, `a1`, `a2`, β¦), variable entities (`v1`, `v2`) and relations (`r1`, `r2`, `r3`) β the user fills in. |
|
|
| ### COINs |
| *Community-Informed Graph Embeddings*, the link-prediction / query-answering approach from PhD thesis section 3.1. Partitions the KG into communities via Leiden clustering, learns separate community-local and global embeddings, and combines them at scoring time. Reduces compute relative to full-graph methods on large KGs. |
|
|
| ### Leiden clustering |
| A community-detection algorithm refining Louvain. The `leiden_resolution` parameter trades community count against community size; the configured per-dataset resolutions are stable across all COINs algorithms for that dataset. |
|
|
| ### Algorithm (COINs context) |
| Embedding scoring family. Supported: TransE, DistMult, ComplEx, RotatE (translation/bilinear/complex), Q2B (Query2Box for hyper-rectangles, supports box queries), KBGAT (graph-attention message passing). Each algorithm declares which `query_structure`s it can answer. |
|
|
| ## Graph generation |
|
|
| ### MultiProxAn |
| The graph-generation method from PhD thesis section 4.3. A discrete denoising diffusion model β DiGress-style β augmented with *MultiProx*, an outer loop over multiple noisy initializations sampled jointly. The Gibbs inner step refines the current graph against several samples, raising sample quality on small graphs (e.g. QM9 molecules). |
|
|
| ### DiGress |
| The base discrete denoising diffusion architecture for graphs. Forward process noises a graph by category permutation; the model learns to reverse the process step by step. |
|
|
| ### Sampling mode |
| Either `standard` (one denoising chain to a single output) or `multiprox` (the outer Gibbs loop wraps several chains). MultiProx adds the parameters `n` (chains), `m` (Gibbs rounds), `t` and `t_prime` (intermediate timesteps), and `gibbs_chain_freq` (preview cadence). |
|
|
| ### Discrete vs. continuous |
| Two model variants per dataset. Discrete predicts categorical distributions over node/edge types directly; continuous predicts in a relaxed continuous space and rounds at the end. Checkpoints are named `{dataset}.ckpt` (discrete) and `{dataset}_c.ckpt` (continuous). |
|
|
| ## KG anomaly correction |
|
|
| ### Subgraph |
| A small (β€ 20-node) connected sample drawn from a COINs Loader's DFS context-subgraph partitioning. Used as input/output for the KG anomaly demo. |
|
|
| ### Task (kg-anomaly) |
| Either `generate` (sample a fresh subgraph from noise) or `correct` (denoise a user-supplied subgraph back toward something the model considers plausible). Each `(dataset, task)` pair has its own checkpoint. |
|
|
| ### Bipartite vs. unipartite subgraph |
| The DFS partitioner emits both: bipartite subgraphs split nodes into two halves with edges across, unipartite subgraphs are a single connected blob. The frontend renders them differently. |
|
|
| ## Inference protocol |
|
|
| ### Inference lock |
| A single `threading.Lock` in `ModelRegistry`. Only one inference runs at a time across the whole process (free HF Spaces is 2 vCPU, no GPU); a busy server returns HTTP 429 (`INFERENCE_BUSY`). `/api/v1/debug/force-unlock` releases a stuck lock when `DEBUG=True`. |
|
|
| ### SSE (Server-Sent Events) |
| The streaming-inference protocol the graph-generation and kg-anomaly endpoints use. Each event has a `type` (`progress` | `preview` | `result`) and a JSON payload. See [reference/sse-protocol.md](reference/sse-protocol.md). |
|
|
| ### Continuation token / state blob |
| Multiprox sampling can pause between Gibbs rounds. The `result` event of a `/generate` or `/correct` call returns a base64-encoded state blob; the client posts that blob to the matching `/continue` endpoint to advance one more round. |
|
|
| ### Inference lifecycle |
| Boot-time: pre-warm checkpoints from HF Hub, scan checkpoint dirs, load lightweight COINs Loaders, generate sample subgraphs. First-request: lazy-load the relevant model weights into memory. See [explanation/inference-lifecycle.md](explanation/inference-lifecycle.md). |
|
|
| ## Deployment |
|
|
| ### HF Space |
| A Hugging Face Spaces application running this repo's `Dockerfile`. The deployed URL is `https://bani57-website.hf.space`. The Space repo is `Bani57/website`. |
|
|
| ### HF Hub model repo |
| `Bani57/checkpoints` β holds all PyTorch weights. Mirrors the on-disk layout under `CHECKPOINTS_ROOT` so `huggingface_hub.snapshot_download` populates files in their expected paths and the registry's scan logic finds them unchanged. |
|
|
| ### Persistent storage (HF Spaces) |
| A paid `/data` volume that survives Space restarts. Free Spaces have 50 GB ephemeral disk that resets on restart. Without persistent storage, every cold start re-downloads checkpoints from HF Hub. |
|
|