File size: 4,517 Bytes
62de81e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14ad307
62de81e
 
 
 
 
 
 
14ad307
da235f6
14ad307
da235f6
14ad307
da235f6
62de81e
 
 
 
14ad307
62de81e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
tags: ['napistu', 'napistu-torch', 'graph-neural-networks', 'biological-networks', 'pytorch', 'graph_conv', 'distmult', 'edge_prediction', 'relation-aware']
library_name: napistu-torch
license: mit
metrics:
- auc
- average_precision
---

# graph_conv-distmult_h128_l3_edge_prediction

This model was trained using [Napistu-Torch](https://www.shackett.org/napistu_torch/), a PyTorch framework for training graph neural networks on biological pathway networks.

The dataset used for training is the 8-source ["Octopus" human consensus network](https://www.shackett.org/octopus_network/), which integrates pathway data from STRING, OmniPath, Reactome, and others. The network encompasses ~50K genes, metabolites, and complexes connected by ~8M interactions.

## Task

This model performs **edge prediction** on biological pathway networks. Given node embeddings, 
the model predicts the likelihood of edges (interactions) between biological entities such as 
genes, proteins, and metabolites. This is useful for:

- Discovering novel biological interactions
- Validating experimentally observed interactions
- Completing incomplete pathway databases
- Predicting functional relationships between genes/proteins

The model learns to score potential edges based on learned embeddings of source and target nodes, 
optionally incorporating relation types for relation-aware prediction.

## Model Description

- **Encoder**
  - Type: `graph_conv`
  - Hidden Channels: `128`
  - Number of Layers: `3`
  - Dropout: `0.2`
  - Edge Encoder: βœ“ (dim=32)
- **Head**
  - Type: `distmult`
  - Relation-Aware: βœ“

**Training Date**: 2025-12-29

For detailed experiment and training settings see this repository's `config.json` file.

## Performance

| Metric | Value |
|--------|-------|
| Validation relation-weighted AUC | 0.8644 |
| Test relation-weighted AUC | 0.8650 |
| Validation AUC | 0.8277 |
| Test AUC | 0.8279 |
| Validation AP | 0.8282 |
| Test AP | 0.8283 |


## Links

- πŸ“Š [W&B Run](https://wandb.ai/napistu/napistu-experiments/runs/kjv3q37g)
- 🌐 [Napistu](https://napistu.com)
- πŸ’» [GitHub Repository](https://github.com/napistu/Napistu-Torch)
- πŸ“– [Read the Docs](https://napistu-torch.readthedocs.io/en/latest)
- πŸ“š [Napistu Wiki](https://github.com/napistu/napistu/wiki)

## Usage

### 1. Setup Environment

To reproduce the environment used for training, run the following commands:

```bash
pip install torch==2.8.0
pip install torch-scatter torch-sparse -f https://data.pyg.org/whl/2.8.0+cpu.html
pip install 'napistu==0.8.5'
pip install 'napistu-torch[pyg,lightning]==0.3.2'
```

### 2. Setup Data Store

First, download the Octopus consensus network data to create a local `NapistuDataStore`:
```python
from napistu_torch.load.gcs import gcs_model_to_store

# Download data and create store
napistu_data_store = gcs_model_to_store(
    napistu_data_dir="path/to/napistu_data",
    store_dir="path/to/store",
    asset_name="human_consensus",
    # Pin to stable version for reproducibility
    asset_version="20250923"
)
```

### 3. Load Pretrained Model from HuggingFace Hub
```python
from napistu_torch.ml.hugging_face import HFModelLoader

# Load checkpoint
loader = HFModelLoader("seanhacks/relation_prediction_distmult_128e")
checkpoint = loader.load_checkpoint()

# Load config to reproduce experiment
experiment_config = loader.load_config()
```

### 4. Use Pretrained Model for Training

You can use this pretrained model as initialization for training via the CLI:
```bash
# Create a training config that uses the pretrained model
cat > my_config.yaml << EOF
name: my_finetuned_model

model:
  use_pretrained_model: true
  pretrained_model_source: huggingface
  pretrained_model_path: seanhacks/relation_prediction_distmult_128e
  pretrained_model_freeze_encoder_weights: false  # Allow fine-tuning

data:
  sbml_dfs_path: path/to/sbml_dfs.pkl
  napistu_graph_path: path/to/graph.pkl
  napistu_data_name: edge_prediction

training:
  epochs: 100
  lr: 0.001
EOF

# Train with pretrained weights
napistu-torch train my_config.yaml
```

## Citation

If you use this model, please cite:
```bibtex
@software{napistu_torch,
  title = {Napistu-Torch: Graph Neural Networks for Biological Pathway Analysis},
  author = {Hackett, Sean R.},
  url = {https://github.com/napistu/Napistu-Torch},
  year = {2025},
  note = {Model: graph_conv-distmult_h128_l3_edge_prediction}
}
```

## License

MIT License - See [LICENSE](https://github.com/napistu/Napistu-Torch/blob/main/LICENSE) for details.