Update README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,15 @@
|
|
| 1 |
-
# GALAX: Graph-Augmented Language Model
|
| 2 |
|
| 3 |
**Repository:** [FuhaiLiAiLab/GALAX](https://huggingface.co/FuhaiLiAiLab/GALAX)
|
| 4 |
**Authors:** Heming Zhang, Fuhai Li, Yixin Chen, *et al.*
|
| 5 |
-
**License:** Research-only use under [DepMap Terms](https://depmap.org
|
| 6 |
|
| 7 |
---
|
| 8 |
|
| 9 |
## π§© Model Overview
|
| 10 |
|
|
|
|
|
|
|
| 11 |
**GALAX** is a graph-augmented language model that integrates:
|
| 12 |
- **LLaMA3-8B-Instruct** as the language backbone (QA-tuned).
|
| 13 |
- **Graph Attention Network (GAT)** trained on BioMedGraphica (multi-omics + knowledge graph).
|
|
@@ -45,6 +47,31 @@ if os.path.exists(combined_model_path):
|
|
| 45 |
print("Loaded best_combined_model.pt successfully")
|
| 46 |
```
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
## π Results
|
| 49 |
|
| 50 |
GALAX consistently outperforms baselines and ablation variants.
|
|
@@ -54,15 +81,6 @@ GALAX consistently outperforms baselines and ablation variants.
|
|
| 54 |
- **Hit@10:** 0.8815
|
| 55 |
- **Hit@5:** 0.9249
|
| 56 |
|
| 57 |
-
<figure>
|
| 58 |
-
<img src="https://huggingface.co/FuhaiLiAiLab/GALAX/resolve/main/Figure4.pdf" width="100%">
|
| 59 |
-
<figcaption><b>Figure:</b> Performance across metrics and example explainable subgraph for LUAD (ACH-000860).</figcaption>
|
| 60 |
-
</figure>
|
| 61 |
-
|
| 62 |
-
---
|
| 63 |
-
|
| 64 |
-
### Performance Tables
|
| 65 |
-
|
| 66 |
**Table 1. Precision and Recall across datasets**
|
| 67 |
|
| 68 |
| Model | Overall Precision β | Overall Recall β | LUAD Precision β | LUAD Recall β | BRCA Precision β | BRCA Recall β |
|
|
@@ -78,8 +96,6 @@ GALAX consistently outperforms baselines and ablation variants.
|
|
| 78 |
| G-Retriever + pre-GAT | 0.4763 Β± 0.0004 | 0.3929 Β± 0.0063 | 0.4642 Β± 0.0181 | 0.3881 Β± 0.0264 | 0.4414 Β± 0.0099 | 0.3772 Β± 0.0010 |
|
| 79 |
| **GALAX** | **0.5472 Β± 0.0053** | **0.5332 Β± 0.0031** | **0.5345 Β± 0.0185** | **0.5157 Β± 0.0043** | **0.5608 Β± 0.0031** | **0.5533 Β± 0.0033** |
|
| 80 |
|
| 81 |
-
---
|
| 82 |
-
|
| 83 |
**Table 2. Hit@10 and Hit@5 across datasets**
|
| 84 |
|
| 85 |
| Model | Overall Hit@10 β | Overall Hit@5 β | LUAD Hit@10 β | LUAD Hit@5 β | BRCA Hit@10 β | BRCA Hit@5 β |
|
|
@@ -97,33 +113,11 @@ GALAX consistently outperforms baselines and ablation variants.
|
|
| 97 |
|
| 98 |
---
|
| 99 |
|
| 100 |
-
## βοΈ Experimental Setup
|
| 101 |
-
|
| 102 |
-
- **Backbone LM:** LLaMA3-8B-Instruct (QA-tuned).
|
| 103 |
-
- **Graph Encoder:** BioBERT-v1.1 embeddings + GAT with edge masking.
|
| 104 |
-
- **Training:** Adam optimizer on 2Γ NVIDIA H100 (80GB).
|
| 105 |
-
- **Top features per omics modality:** K = 10.
|
| 106 |
-
- **Subgraph rollout depth:** L = 5, candidate nodes Ξ· = 20.
|
| 107 |
-
- **Evaluation:** Precision, Recall, F1, Jaccard, Hit@5, Hit@10.
|
| 108 |
-
|
| 109 |
-
---
|
| 110 |
-
|
| 111 |
-
## π Baselines & Ablations
|
| 112 |
-
|
| 113 |
-
- **M2T (Multiomic2Target):** Only omics β poor performance.
|
| 114 |
-
- **L3+Omics:** No QA finetuning β weak results.
|
| 115 |
-
- **L3-FT(QA)+Omics:** Large gain from QA finetuning.
|
| 116 |
-
- **GAT / G-Retriever+pre-GAT:** Partial improvements, unstable.
|
| 117 |
-
- **+ Static KG:** Minimal or negative gains.
|
| 118 |
-
- **GALAX (QA + KG + RL):** Consistent cross-dataset gains (2β5% absolute).
|
| 119 |
-
|
| 120 |
-
---
|
| 121 |
-
|
| 122 |
## π¬ Intended Uses
|
| 123 |
|
| 124 |
- **Research use only**
|
| 125 |
- Target prioritization in **cancer biology**
|
| 126 |
-
- Benchmarking **graph-language foundation models** in
|
| 127 |
|
| 128 |
---
|
| 129 |
|
|
@@ -141,7 +135,7 @@ If you use this model, please cite:
|
|
| 141 |
|
| 142 |
```bibtex
|
| 143 |
@article{zhang2025galax,
|
| 144 |
-
title={GALAX: Graph-Augmented Language Model
|
| 145 |
author={Zhang, Heming and Li, Fuhai and Chen, Yixin and others},
|
| 146 |
year={2025},
|
| 147 |
journal={Preprint}
|
|
|
|
| 1 |
+
# GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
|
| 2 |
|
| 3 |
**Repository:** [FuhaiLiAiLab/GALAX](https://huggingface.co/FuhaiLiAiLab/GALAX)
|
| 4 |
**Authors:** Heming Zhang, Fuhai Li, Yixin Chen, *et al.*
|
| 5 |
+
**License:** Research-only use under [DepMap Terms](https://depmap.org).
|
| 6 |
|
| 7 |
---
|
| 8 |
|
| 9 |
## π§© Model Overview
|
| 10 |
|
| 11 |
+

|
| 12 |
+
|
| 13 |
**GALAX** is a graph-augmented language model that integrates:
|
| 14 |
- **LLaMA3-8B-Instruct** as the language backbone (QA-tuned).
|
| 15 |
- **Graph Attention Network (GAT)** trained on BioMedGraphica (multi-omics + knowledge graph).
|
|
|
|
| 47 |
print("Loaded best_combined_model.pt successfully")
|
| 48 |
```
|
| 49 |
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## βοΈ Experimental Setup
|
| 53 |
+
|
| 54 |
+
- **Backbone LM:** LLaMA3-8B-Instruct (QA-tuned).
|
| 55 |
+
- **Graph Encoder:** BioBERT-v1.1 embeddings + GAT with edge masking.
|
| 56 |
+
- **Training:** Adam optimizer on 2Γ NVIDIA H100 (80GB).
|
| 57 |
+
- **Top features per omics modality:** K = 10.
|
| 58 |
+
- **Subgraph rollout depth:** L = 5, candidate nodes Ξ· = 20.
|
| 59 |
+
- **Evaluation:** Precision, Recall, F1, Jaccard, Hit@5, Hit@10.
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
## π Baselines & Ablations
|
| 64 |
+
|
| 65 |
+
- **M2T (Multiomic2Target):** Only omics β poor performance.
|
| 66 |
+
- **L3+Omics:** No QA finetuning β weak results.
|
| 67 |
+
- **L3-FT(QA)+Omics:** Large gain from QA finetuning.
|
| 68 |
+
- **GAT / G-Retriever+pre-GAT:** Partial improvements, unstable.
|
| 69 |
+
- **+ Static KG:** Minimal or negative gains.
|
| 70 |
+
- **GALAX (QA + KG + RL):** Consistent cross-dataset gains (2β5% absolute).
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
|
| 75 |
## π Results
|
| 76 |
|
| 77 |
GALAX consistently outperforms baselines and ablation variants.
|
|
|
|
| 81 |
- **Hit@10:** 0.8815
|
| 82 |
- **Hit@5:** 0.9249
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
**Table 1. Precision and Recall across datasets**
|
| 85 |
|
| 86 |
| Model | Overall Precision β | Overall Recall β | LUAD Precision β | LUAD Recall β | BRCA Precision β | BRCA Recall β |
|
|
|
|
| 96 |
| G-Retriever + pre-GAT | 0.4763 Β± 0.0004 | 0.3929 Β± 0.0063 | 0.4642 Β± 0.0181 | 0.3881 Β± 0.0264 | 0.4414 Β± 0.0099 | 0.3772 Β± 0.0010 |
|
| 97 |
| **GALAX** | **0.5472 Β± 0.0053** | **0.5332 Β± 0.0031** | **0.5345 Β± 0.0185** | **0.5157 Β± 0.0043** | **0.5608 Β± 0.0031** | **0.5533 Β± 0.0033** |
|
| 98 |
|
|
|
|
|
|
|
| 99 |
**Table 2. Hit@10 and Hit@5 across datasets**
|
| 100 |
|
| 101 |
| Model | Overall Hit@10 β | Overall Hit@5 β | LUAD Hit@10 β | LUAD Hit@5 β | BRCA Hit@10 β | BRCA Hit@5 β |
|
|
|
|
| 113 |
|
| 114 |
---
|
| 115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
## π¬ Intended Uses
|
| 117 |
|
| 118 |
- **Research use only**
|
| 119 |
- Target prioritization in **cancer biology**
|
| 120 |
+
- Benchmarking **graph-language foundation models** in target priorization
|
| 121 |
|
| 122 |
---
|
| 123 |
|
|
|
|
| 135 |
|
| 136 |
```bibtex
|
| 137 |
@article{zhang2025galax,
|
| 138 |
+
title={GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine},
|
| 139 |
author={Zhang, Heming and Li, Fuhai and Chen, Yixin and others},
|
| 140 |
year={2025},
|
| 141 |
journal={Preprint}
|