File size: 4,077 Bytes
532276a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | ---
license: mit
language:
- en
library_name: pytorch
tags:
- knowledge-graph
- link-prediction
- query-answering
- graph-generation
- graph-diffusion
- knowledge-graph-completion
- phd-thesis
- epfl
datasets:
- FB15k-237
- WN18RR
- NELL-995
- QM9
---
# PhD research checkpoints — Andrej Janchevski (EPFL, 2025)
PyTorch checkpoint dump for the three research methods presented in the thesis
_Scalable Methods for Knowledge Graph Reasoning and Generation_
([infoscience.epfl.ch](https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c)).
The repository mirrors the on-disk layout the demo backend expects, so a single
`huggingface_hub.snapshot_download(repo_id="Bani57/checkpoints", local_dir=...)`
drops every file into its final location with no extra wiring.
The interactive demos that consume these weights are deployed at
<https://bani57-website.hf.space>; source at
<https://huggingface.co/spaces/Bani57/website>.
## Methods and weights
### COINs — knowledge graph reasoning (thesis §3.1)
*Community-Informed Graph Embeddings.* Six embedding scoring families
(TransE, DistMult, ComplEx, RotatE, Q2B, KBGAT) trained on three KGs.
Partitions each KG into Leiden communities and learns separate
community-local and global embeddings, combined at scoring time.
`COINs-KGGeneration/graph_completion/checkpoints/{dataset}_{algorithm}.tar`
— 18 files, ~2.6 GB.
Datasets: `freebase` (FB15k-237), `wordnet` (WN18RR), `nell` (NELL-995).
Algorithms: `transe`, `distmult`, `complex`, `rotate`, `q2b`, `kbgat`.
`COINs-KGGeneration/graph_completion/results/{dataset}/transe_model.tar`
— 3 files, ~185 MB.
TransE pre-init checkpoints used to bootstrap the KBGAT embedder.
### MultiProxAn — graph generation (thesis §4.3)
Discrete denoising diffusion model with the *MultiProx* outer Gibbs loop for
multi-chain refinement. Generates molecular graphs (QM9) and synthetic
community graphs (comm20).
`MultiProxAn/checkpoints/{dataset}{,_c}.ckpt`
— 4 files, ~380 MB.
Discrete (`{dataset}.ckpt`) and continuous (`{dataset}_c.ckpt`) variants.
### KG anomaly correction (thesis §4.4)
DiGress-style diffusion conditioned on the COINs embedder for the same
dataset. Either samples a fresh subgraph (`generate`) or denoises a
user-supplied subgraph (`correct`).
`COINs-KGGeneration/graph_generation/checkpoints/{dataset}{,_correct}.ckpt`
— 6 files, ~2.7 GB.
## Usage
The deployed website downloads the entire repository into its
`CHECKPOINTS_ROOT` at container startup:
```python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="Bani57/checkpoints",
repo_type="model",
local_dir="src/research", # mirrors the on-disk layout
local_dir_use_symlinks=False,
)
```
For accelerated downloads, install `hf_transfer` and set
`HF_HUB_ENABLE_HF_TRANSFER=1`. Total payload ≈ 5.8 GB.
The weights are loaded by [`ModelRegistry`](https://huggingface.co/spaces/Bani57/website/blob/main/src/backend/api/services/registry.py)
in the website backend; lazy per-request loading keeps the working set small.
## Training
The COINs and MultiProxAn checkpoints were trained on EPFL's GPU cluster
during 2021–2025 as part of the doctoral research programme. Training
hyperparameters live in the
[research code's YAML configs](https://huggingface.co/spaces/Bani57/website/tree/main/src/research/COINs-KGGeneration/graph_completion/configs).
## Intended use
These checkpoints are released to power the interactive thesis demos
linked above. They are research artefacts; downstream production use is
neither tested nor supported.
## Citation
```bibtex
@phdthesis{janchevski_scalable_2025,
author = {Andrej Janchevski},
title = {Scalable Methods for Knowledge Graph Reasoning and Generation},
school = {{EPFL}},
year = {2025},
url = {https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c},
}
```
## License
MIT for the released weights and source. The research methods retain
their original publication terms; see the thesis.
|