checkpoints / README.md
Bani57's picture
Upload README.md with huggingface_hub
532276a verified
---
license: mit
language:
- en
library_name: pytorch
tags:
- knowledge-graph
- link-prediction
- query-answering
- graph-generation
- graph-diffusion
- knowledge-graph-completion
- phd-thesis
- epfl
datasets:
- FB15k-237
- WN18RR
- NELL-995
- QM9
---
# PhD research checkpoints β€” Andrej Janchevski (EPFL, 2025)
PyTorch checkpoint dump for the three research methods presented in the thesis
_Scalable Methods for Knowledge Graph Reasoning and Generation_
([infoscience.epfl.ch](https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c)).
The repository mirrors the on-disk layout the demo backend expects, so a single
`huggingface_hub.snapshot_download(repo_id="Bani57/checkpoints", local_dir=...)`
drops every file into its final location with no extra wiring.
The interactive demos that consume these weights are deployed at
<https://bani57-website.hf.space>; source at
<https://huggingface.co/spaces/Bani57/website>.
## Methods and weights
### COINs β€” knowledge graph reasoning (thesis Β§3.1)
*Community-Informed Graph Embeddings.* Six embedding scoring families
(TransE, DistMult, ComplEx, RotatE, Q2B, KBGAT) trained on three KGs.
Partitions each KG into Leiden communities and learns separate
community-local and global embeddings, combined at scoring time.
`COINs-KGGeneration/graph_completion/checkpoints/{dataset}_{algorithm}.tar`
β€” 18 files, ~2.6 GB.
Datasets: `freebase` (FB15k-237), `wordnet` (WN18RR), `nell` (NELL-995).
Algorithms: `transe`, `distmult`, `complex`, `rotate`, `q2b`, `kbgat`.
`COINs-KGGeneration/graph_completion/results/{dataset}/transe_model.tar`
β€” 3 files, ~185 MB.
TransE pre-init checkpoints used to bootstrap the KBGAT embedder.
### MultiProxAn β€” graph generation (thesis Β§4.3)
Discrete denoising diffusion model with the *MultiProx* outer Gibbs loop for
multi-chain refinement. Generates molecular graphs (QM9) and synthetic
community graphs (comm20).
`MultiProxAn/checkpoints/{dataset}{,_c}.ckpt`
β€” 4 files, ~380 MB.
Discrete (`{dataset}.ckpt`) and continuous (`{dataset}_c.ckpt`) variants.
### KG anomaly correction (thesis Β§4.4)
DiGress-style diffusion conditioned on the COINs embedder for the same
dataset. Either samples a fresh subgraph (`generate`) or denoises a
user-supplied subgraph (`correct`).
`COINs-KGGeneration/graph_generation/checkpoints/{dataset}{,_correct}.ckpt`
β€” 6 files, ~2.7 GB.
## Usage
The deployed website downloads the entire repository into its
`CHECKPOINTS_ROOT` at container startup:
```python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="Bani57/checkpoints",
repo_type="model",
local_dir="src/research", # mirrors the on-disk layout
local_dir_use_symlinks=False,
)
```
For accelerated downloads, install `hf_transfer` and set
`HF_HUB_ENABLE_HF_TRANSFER=1`. Total payload β‰ˆ 5.8 GB.
The weights are loaded by [`ModelRegistry`](https://huggingface.co/spaces/Bani57/website/blob/main/src/backend/api/services/registry.py)
in the website backend; lazy per-request loading keeps the working set small.
## Training
The COINs and MultiProxAn checkpoints were trained on EPFL's GPU cluster
during 2021–2025 as part of the doctoral research programme. Training
hyperparameters live in the
[research code's YAML configs](https://huggingface.co/spaces/Bani57/website/tree/main/src/research/COINs-KGGeneration/graph_completion/configs).
## Intended use
These checkpoints are released to power the interactive thesis demos
linked above. They are research artefacts; downstream production use is
neither tested nor supported.
## Citation
```bibtex
@phdthesis{janchevski_scalable_2025,
author = {Andrej Janchevski},
title = {Scalable Methods for Knowledge Graph Reasoning and Generation},
school = {{EPFL}},
year = {2025},
url = {https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c},
}
```
## License
MIT for the released weights and source. The research methods retain
their original publication terms; see the thesis.