--- license: mit language: - en library_name: pytorch tags: - knowledge-graph - link-prediction - query-answering - graph-generation - graph-diffusion - knowledge-graph-completion - phd-thesis - epfl datasets: - FB15k-237 - WN18RR - NELL-995 - QM9 --- # PhD research checkpoints — Andrej Janchevski (EPFL, 2025) PyTorch checkpoint dump for the three research methods presented in the thesis _Scalable Methods for Knowledge Graph Reasoning and Generation_ ([infoscience.epfl.ch](https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c)). The repository mirrors the on-disk layout the demo backend expects, so a single `huggingface_hub.snapshot_download(repo_id="Bani57/checkpoints", local_dir=...)` drops every file into its final location with no extra wiring. The interactive demos that consume these weights are deployed at ; source at . ## Methods and weights ### COINs — knowledge graph reasoning (thesis §3.1) *Community-Informed Graph Embeddings.* Six embedding scoring families (TransE, DistMult, ComplEx, RotatE, Q2B, KBGAT) trained on three KGs. Partitions each KG into Leiden communities and learns separate community-local and global embeddings, combined at scoring time. `COINs-KGGeneration/graph_completion/checkpoints/{dataset}_{algorithm}.tar` — 18 files, ~2.6 GB. Datasets: `freebase` (FB15k-237), `wordnet` (WN18RR), `nell` (NELL-995). Algorithms: `transe`, `distmult`, `complex`, `rotate`, `q2b`, `kbgat`. `COINs-KGGeneration/graph_completion/results/{dataset}/transe_model.tar` — 3 files, ~185 MB. TransE pre-init checkpoints used to bootstrap the KBGAT embedder. ### MultiProxAn — graph generation (thesis §4.3) Discrete denoising diffusion model with the *MultiProx* outer Gibbs loop for multi-chain refinement. Generates molecular graphs (QM9) and synthetic community graphs (comm20). `MultiProxAn/checkpoints/{dataset}{,_c}.ckpt` — 4 files, ~380 MB. Discrete (`{dataset}.ckpt`) and continuous (`{dataset}_c.ckpt`) variants. ### KG anomaly correction (thesis §4.4) DiGress-style diffusion conditioned on the COINs embedder for the same dataset. Either samples a fresh subgraph (`generate`) or denoises a user-supplied subgraph (`correct`). `COINs-KGGeneration/graph_generation/checkpoints/{dataset}{,_correct}.ckpt` — 6 files, ~2.7 GB. ## Usage The deployed website downloads the entire repository into its `CHECKPOINTS_ROOT` at container startup: ```python from huggingface_hub import snapshot_download snapshot_download( repo_id="Bani57/checkpoints", repo_type="model", local_dir="src/research", # mirrors the on-disk layout local_dir_use_symlinks=False, ) ``` For accelerated downloads, install `hf_transfer` and set `HF_HUB_ENABLE_HF_TRANSFER=1`. Total payload ≈ 5.8 GB. The weights are loaded by [`ModelRegistry`](https://huggingface.co/spaces/Bani57/website/blob/main/src/backend/api/services/registry.py) in the website backend; lazy per-request loading keeps the working set small. ## Training The COINs and MultiProxAn checkpoints were trained on EPFL's GPU cluster during 2021–2025 as part of the doctoral research programme. Training hyperparameters live in the [research code's YAML configs](https://huggingface.co/spaces/Bani57/website/tree/main/src/research/COINs-KGGeneration/graph_completion/configs). ## Intended use These checkpoints are released to power the interactive thesis demos linked above. They are research artefacts; downstream production use is neither tested nor supported. ## Citation ```bibtex @phdthesis{janchevski_scalable_2025, author = {Andrej Janchevski}, title = {Scalable Methods for Knowledge Graph Reasoning and Generation}, school = {{EPFL}}, year = {2025}, url = {https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c}, } ``` ## License MIT for the released weights and source. The research methods retain their original publication terms; see the thesis.