Upload model card draft
Browse files
README.md
CHANGED
|
@@ -1,3 +1,98 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- knowledge graph
|
| 4 |
+
- graph neural networks
|
| 5 |
+
- graph ai
|
| 6 |
+
- neuroscience
|
| 7 |
+
- neurology
|
| 8 |
+
- drug repurposing
|
| 9 |
+
- alzheimer
|
| 10 |
+
- parkinson
|
| 11 |
+
- bipolar
|
| 12 |
+
license: mit
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# PROTON Model Card
|
| 16 |
+
|
| 17 |
+
[](https://zitniklab.hms.harvard.edu/PROTON)
|
| 18 |
+
[](https://arxiv.org)
|
| 19 |
+
[](https://github.com/mims-harvard/PROTON)
|
| 20 |
+
[](https://huggingface.co/ayushnoori/PROTON)
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
## Introduction
|
| 24 |
+
|
| 25 |
+
Neurological diseases are the leading global cause of disability, yet most lack disease-modifying treatments. To help address this gap, we developed PROTON, a graph AI model that generates hypotheses for neurological disease. PROTON uses a heterogeneous graph transformer contextualized to the adult human brain. PROTON generated predictions across Parkinson's disease (PD), bipolar disorder (BD), and Alzheimer's disease (AD), which we validated using three independent biological systems. In PD, PROTON linked genetic risk loci to genes essential for dopaminergic neuron survival and identified pesticides toxic to patient-derived neurons, including the insecticide Naled ranked within the top 6.75% of predictions. *In silico* PROTON screens reproduced six genome-wide alpha-synuclein experiments, including a split-ubiquitin yeast two-hybrid system (normalized enrichment score NES = 2.27, FDR-adjusted *p* < 1E-4), an ascorbate peroxidase proximity labeling assay (NES = 2.22, FDR < 1E-4), and a high-depth targeted deep sequencing study in 496 synucleinopathy patients (NES = 1.73, FDR < 1.9E-3). In BD, PROTON nominated calcitriol as a candidate drug that reversed proteomic alterations observed in cortical organoids derived from BD patients. In AD, we evaluated PROTON predictions in electronic health records from *n* = 610,524 patients at Mass General Brigham, confirming that five PROTON-predicted drugs were associated with reduced seven-year dementia risk (minimum hazard ratio = 0.63, 95% CI: 0.53–0.75, *p* < 1E-7). PROTON generated and validated mechanistic hypotheses across molecular, organoid, and clinical systems, defining a path for AI-driven discovery in neurological disease.
|
| 26 |
+
|
| 27 |
+
## Training Data
|
| 28 |
+
|
| 29 |
+
PROTON was trained on NeuroKG, a heterogeneous, undirected biomedical knowledge graph contextualized to the human brain. NeuroKG unifies 36 human datasets and ontologies, and integrates single-nucleus RNA-sequencing atlases comprising 3,756,702 cells from the adult human brain. The knowledge graph contains 147,020 nodes across 16 entity types and 7,366,745 edges across 47 relation types. NeuroKG is available via Harvard Dataverse at DOI: [10.7910/DVN/ZDLS3K](https://doi.org/10.7910/DVN/ZDLS3K). For more details, please refer to our [project website](https://zitniklab.hms.harvard.edu/PROTON).
|
| 30 |
+
|
| 31 |
+
## Model Architecture
|
| 32 |
+
|
| 33 |
+
PROTON is a a 578-million-parameter heterogeneous graph transformer for neurological disease. It was trained on NeuroKG using a self-supervised link prediction objective. Through Bayesian hyperparameter optimization, we selected a model architecture that achieved high link-prediction performance (AUROC $=0.9145$; accuracy $=82.23\%$) on an independent test set.
|
| 34 |
+
|
| 35 |
+
### Model Hyperparameters
|
| 36 |
+
- `num_feat`: `1024`
|
| 37 |
+
- `num_heads`: `4`
|
| 38 |
+
- `hidden_dim`: `256`
|
| 39 |
+
- `output_dim`: `128`
|
| 40 |
+
- `num_layers`: `3`
|
| 41 |
+
- `dropout_prob`: `0.4546844003628963`
|
| 42 |
+
- `pred_threshold`: `0.5`
|
| 43 |
+
|
| 44 |
+
### Files Included
|
| 45 |
+
- `model.ckpt`: PyTorch Lightning checkpoint containing model weights.
|
| 46 |
+
- `decoder.pt`: Decoder weights for link prediction (shape `[94, 512]`).
|
| 47 |
+
- `edge_types.pt`: Ordered list of 47 edge types in NeuroKG to create edge type IDs.
|
| 48 |
+
- `embeddings.pt`: Store of learned embeddings for all 147,020 nodes in NeuroKG (shape `[147020, 512]`).
|
| 49 |
+
- `embeddings.csv`: Embedding store as a CSV file.
|
| 50 |
+
|
| 51 |
+
For more details, please refer to our [project website](https://zitniklab.hms.harvard.edu/PROTON).
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
## Usage Instructions
|
| 55 |
+
|
| 56 |
+
To use PROTON, please clone the [GitHub repository](https://github.com/mims-harvard/PROTON) and follow the instructions in the [README.md](https://github.com/mims-harvard/PROTON/blob/main/README.md). For example, after downloading the model weights and modifying the `conf/default.config.yaml` file appropriately, you can load the model with the following code:
|
| 57 |
+
```python
|
| 58 |
+
import pytorch_lightning as pl
|
| 59 |
+
from src.config import conf
|
| 60 |
+
from src.constants import TORCH_DEVICE
|
| 61 |
+
from src.dataloaders import load_graph
|
| 62 |
+
from src.models import HGT
|
| 63 |
+
|
| 64 |
+
pl.seed_everything(conf.seed, workers=True)
|
| 65 |
+
kg = load_graph(nodes, edges)
|
| 66 |
+
pretrain_model = HGT.load_from_checkpoint(
|
| 67 |
+
checkpoint_path=str(conf.hgt.checkpoint_path),
|
| 68 |
+
kg=kg,
|
| 69 |
+
strict=False,
|
| 70 |
+
)
|
| 71 |
+
pretrain_model.eval()
|
| 72 |
+
pretrain_model.cache_graph(kg, overwrite=False, degree_threshold=conf.neurokg.hparams.degree_threshold)
|
| 73 |
+
pretrain_model = pretrain_model.to(TORCH_DEVICE)
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
## License
|
| 78 |
+
|
| 79 |
+
PROTON is released under the [MIT License](https://github.com/mims-harvard/PROTON/blob/main/LICENSE).
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
## Citation
|
| 83 |
+
|
| 84 |
+
If you use PROTON, please cite:
|
| 85 |
+
```
|
| 86 |
+
@article{noori_graph_2025,
|
| 87 |
+
title={Graph AI generates neurological hypotheses validated in molecular, organoid, and clinical systems},
|
| 88 |
+
author={Noori, Ayush and Polonuer, Joaquin and Meyer, Katharina and Budnik, Bogdan and Morton, Shad and Wang, Xinyuan and Nazeem, Sumaiya and He, Yingnan and Arango, Iñaki and Vittor, Lucas and Woodworth, Matthew and Krolewski, Richard C. and Li, Michelle M. and Liu, Ninning and Kamath, Tushar and Macosko, Evan and Ritter, Dylan and Afroz, Jalwa and Henderson, Alexander B. H. and Studer, Lorenz and Rodriques, Samuel G. and White, Andrew and Dagan, Noa and Clifton, David A. and Church, George M. and Das, Sudeshna and Tam, Jenny M. and Khurana, Vikram and Zitnik, Marinka},
|
| 89 |
+
journal={arXiv preprint},
|
| 90 |
+
note={arXiv:XXXX.XXXXX (placeholder)},
|
| 91 |
+
year={2025}
|
| 92 |
+
}
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
## Contact
|
| 97 |
+
|
| 98 |
+
For any questions or feedback, please open an issue in the [GitHub repository](https://github.com/mims-harvard/PROTON/issues/new) or contact [Ayush Noori](mailto:ayush.noori@sjc.ox.ac.uk) and [Marinka Zitnik](mailto:marinka@hms.harvard.edu).
|