| --- |
| license: mit |
| language: |
| - en |
| library_name: pytorch |
| tags: |
| - knowledge-graph |
| - link-prediction |
| - query-answering |
| - graph-generation |
| - graph-diffusion |
| - knowledge-graph-completion |
| - phd-thesis |
| - epfl |
| datasets: |
| - FB15k-237 |
| - WN18RR |
| - NELL-995 |
| - QM9 |
| --- |
| |
| # PhD research checkpoints β Andrej Janchevski (EPFL, 2025) |
|
|
| PyTorch checkpoint dump for the three research methods presented in the thesis |
| _Scalable Methods for Knowledge Graph Reasoning and Generation_ |
| ([infoscience.epfl.ch](https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c)). |
| The repository mirrors the on-disk layout the demo backend expects, so a single |
| `huggingface_hub.snapshot_download(repo_id="Bani57/checkpoints", local_dir=...)` |
| drops every file into its final location with no extra wiring. |
|
|
| The interactive demos that consume these weights are deployed at |
| <https://bani57-website.hf.space>; source at |
| <https://huggingface.co/spaces/Bani57/website>. |
|
|
| ## Methods and weights |
|
|
| ### COINs β knowledge graph reasoning (thesis Β§3.1) |
| *Community-Informed Graph Embeddings.* Six embedding scoring families |
| (TransE, DistMult, ComplEx, RotatE, Q2B, KBGAT) trained on three KGs. |
| Partitions each KG into Leiden communities and learns separate |
| community-local and global embeddings, combined at scoring time. |
|
|
| `COINs-KGGeneration/graph_completion/checkpoints/{dataset}_{algorithm}.tar` |
| β 18 files, ~2.6 GB. |
| Datasets: `freebase` (FB15k-237), `wordnet` (WN18RR), `nell` (NELL-995). |
| Algorithms: `transe`, `distmult`, `complex`, `rotate`, `q2b`, `kbgat`. |
|
|
| `COINs-KGGeneration/graph_completion/results/{dataset}/transe_model.tar` |
| β 3 files, ~185 MB. |
| TransE pre-init checkpoints used to bootstrap the KBGAT embedder. |
|
|
| ### MultiProxAn β graph generation (thesis Β§4.3) |
| Discrete denoising diffusion model with the *MultiProx* outer Gibbs loop for |
| multi-chain refinement. Generates molecular graphs (QM9) and synthetic |
| community graphs (comm20). |
|
|
| `MultiProxAn/checkpoints/{dataset}{,_c}.ckpt` |
| β 4 files, ~380 MB. |
| Discrete (`{dataset}.ckpt`) and continuous (`{dataset}_c.ckpt`) variants. |
|
|
| ### KG anomaly correction (thesis Β§4.4) |
| DiGress-style diffusion conditioned on the COINs embedder for the same |
| dataset. Either samples a fresh subgraph (`generate`) or denoises a |
| user-supplied subgraph (`correct`). |
|
|
| `COINs-KGGeneration/graph_generation/checkpoints/{dataset}{,_correct}.ckpt` |
| β 6 files, ~2.7 GB. |
|
|
| ## Usage |
|
|
| The deployed website downloads the entire repository into its |
| `CHECKPOINTS_ROOT` at container startup: |
|
|
| ```python |
| from huggingface_hub import snapshot_download |
| snapshot_download( |
| repo_id="Bani57/checkpoints", |
| repo_type="model", |
| local_dir="src/research", # mirrors the on-disk layout |
| local_dir_use_symlinks=False, |
| ) |
| ``` |
|
|
| For accelerated downloads, install `hf_transfer` and set |
| `HF_HUB_ENABLE_HF_TRANSFER=1`. Total payload β 5.8 GB. |
|
|
| The weights are loaded by [`ModelRegistry`](https://huggingface.co/spaces/Bani57/website/blob/main/src/backend/api/services/registry.py) |
| in the website backend; lazy per-request loading keeps the working set small. |
|
|
| ## Training |
|
|
| The COINs and MultiProxAn checkpoints were trained on EPFL's GPU cluster |
| during 2021β2025 as part of the doctoral research programme. Training |
| hyperparameters live in the |
| [research code's YAML configs](https://huggingface.co/spaces/Bani57/website/tree/main/src/research/COINs-KGGeneration/graph_completion/configs). |
|
|
| ## Intended use |
|
|
| These checkpoints are released to power the interactive thesis demos |
| linked above. They are research artefacts; downstream production use is |
| neither tested nor supported. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @phdthesis{janchevski_scalable_2025, |
| author = {Andrej Janchevski}, |
| title = {Scalable Methods for Knowledge Graph Reasoning and Generation}, |
| school = {{EPFL}}, |
| year = {2025}, |
| url = {https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c}, |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT for the released weights and source. The research methods retain |
| their original publication terms; see the thesis. |
|
|