HemingZhang commited on
Commit
bd29d7c
Β·
verified Β·
1 Parent(s): 3591ec1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -37
README.md CHANGED
@@ -1,13 +1,15 @@
1
- # GALAX: Graph-Augmented Language Model with Explainability for CRISPR Target Prioritization
2
 
3
  **Repository:** [FuhaiLiAiLab/GALAX](https://huggingface.co/FuhaiLiAiLab/GALAX)
4
  **Authors:** Heming Zhang, Fuhai Li, Yixin Chen, *et al.*
5
- **License:** Research-only use under [DepMap Terms](https://depmap.org/portal/termsOfUse).
6
 
7
  ---
8
 
9
  ## 🧩 Model Overview
10
 
 
 
11
  **GALAX** is a graph-augmented language model that integrates:
12
  - **LLaMA3-8B-Instruct** as the language backbone (QA-tuned).
13
  - **Graph Attention Network (GAT)** trained on BioMedGraphica (multi-omics + knowledge graph).
@@ -45,6 +47,31 @@ if os.path.exists(combined_model_path):
45
  print("Loaded best_combined_model.pt successfully")
46
  ```
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ## πŸ“Š Results
49
 
50
  GALAX consistently outperforms baselines and ablation variants.
@@ -54,15 +81,6 @@ GALAX consistently outperforms baselines and ablation variants.
54
  - **Hit@10:** 0.8815
55
  - **Hit@5:** 0.9249
56
 
57
- <figure>
58
- <img src="https://huggingface.co/FuhaiLiAiLab/GALAX/resolve/main/Figure4.pdf" width="100%">
59
- <figcaption><b>Figure:</b> Performance across metrics and example explainable subgraph for LUAD (ACH-000860).</figcaption>
60
- </figure>
61
-
62
- ---
63
-
64
- ### Performance Tables
65
-
66
  **Table 1. Precision and Recall across datasets**
67
 
68
  | Model | Overall Precision ↑ | Overall Recall ↑ | LUAD Precision ↑ | LUAD Recall ↑ | BRCA Precision ↑ | BRCA Recall ↑ |
@@ -78,8 +96,6 @@ GALAX consistently outperforms baselines and ablation variants.
78
  | G-Retriever + pre-GAT | 0.4763 Β± 0.0004 | 0.3929 Β± 0.0063 | 0.4642 Β± 0.0181 | 0.3881 Β± 0.0264 | 0.4414 Β± 0.0099 | 0.3772 Β± 0.0010 |
79
  | **GALAX** | **0.5472 Β± 0.0053** | **0.5332 Β± 0.0031** | **0.5345 Β± 0.0185** | **0.5157 Β± 0.0043** | **0.5608 Β± 0.0031** | **0.5533 Β± 0.0033** |
80
 
81
- ---
82
-
83
  **Table 2. Hit@10 and Hit@5 across datasets**
84
 
85
  | Model | Overall Hit@10 ↑ | Overall Hit@5 ↑ | LUAD Hit@10 ↑ | LUAD Hit@5 ↑ | BRCA Hit@10 ↑ | BRCA Hit@5 ↑ |
@@ -97,33 +113,11 @@ GALAX consistently outperforms baselines and ablation variants.
97
 
98
  ---
99
 
100
- ## βš™οΈ Experimental Setup
101
-
102
- - **Backbone LM:** LLaMA3-8B-Instruct (QA-tuned).
103
- - **Graph Encoder:** BioBERT-v1.1 embeddings + GAT with edge masking.
104
- - **Training:** Adam optimizer on 2Γ— NVIDIA H100 (80GB).
105
- - **Top features per omics modality:** K = 10.
106
- - **Subgraph rollout depth:** L = 5, candidate nodes Ξ· = 20.
107
- - **Evaluation:** Precision, Recall, F1, Jaccard, Hit@5, Hit@10.
108
-
109
- ---
110
-
111
- ## πŸ“‰ Baselines & Ablations
112
-
113
- - **M2T (Multiomic2Target):** Only omics β†’ poor performance.
114
- - **L3+Omics:** No QA finetuning β†’ weak results.
115
- - **L3-FT(QA)+Omics:** Large gain from QA finetuning.
116
- - **GAT / G-Retriever+pre-GAT:** Partial improvements, unstable.
117
- - **+ Static KG:** Minimal or negative gains.
118
- - **GALAX (QA + KG + RL):** Consistent cross-dataset gains (2–5% absolute).
119
-
120
- ---
121
-
122
  ## πŸ”¬ Intended Uses
123
 
124
  - **Research use only**
125
  - Target prioritization in **cancer biology**
126
- - Benchmarking **graph-language foundation models** in bioinformatics
127
 
128
  ---
129
 
@@ -141,7 +135,7 @@ If you use this model, please cite:
141
 
142
  ```bibtex
143
  @article{zhang2025galax,
144
- title={GALAX: Graph-Augmented Language Model with Explainability for CRISPR Target Prioritization},
145
  author={Zhang, Heming and Li, Fuhai and Chen, Yixin and others},
146
  year={2025},
147
  journal={Preprint}
 
1
+ # GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
2
 
3
  **Repository:** [FuhaiLiAiLab/GALAX](https://huggingface.co/FuhaiLiAiLab/GALAX)
4
  **Authors:** Heming Zhang, Fuhai Li, Yixin Chen, *et al.*
5
+ **License:** Research-only use under [DepMap Terms](https://depmap.org).
6
 
7
  ---
8
 
9
  ## 🧩 Model Overview
10
 
11
+ ![GALAX Overall Architecture](./Figure2.png)
12
+
13
  **GALAX** is a graph-augmented language model that integrates:
14
  - **LLaMA3-8B-Instruct** as the language backbone (QA-tuned).
15
  - **Graph Attention Network (GAT)** trained on BioMedGraphica (multi-omics + knowledge graph).
 
47
  print("Loaded best_combined_model.pt successfully")
48
  ```
49
 
50
+ ---
51
+
52
+ ## βš™οΈ Experimental Setup
53
+
54
+ - **Backbone LM:** LLaMA3-8B-Instruct (QA-tuned).
55
+ - **Graph Encoder:** BioBERT-v1.1 embeddings + GAT with edge masking.
56
+ - **Training:** Adam optimizer on 2Γ— NVIDIA H100 (80GB).
57
+ - **Top features per omics modality:** K = 10.
58
+ - **Subgraph rollout depth:** L = 5, candidate nodes Ξ· = 20.
59
+ - **Evaluation:** Precision, Recall, F1, Jaccard, Hit@5, Hit@10.
60
+
61
+ ---
62
+
63
+ ## πŸ“‰ Baselines & Ablations
64
+
65
+ - **M2T (Multiomic2Target):** Only omics β†’ poor performance.
66
+ - **L3+Omics:** No QA finetuning β†’ weak results.
67
+ - **L3-FT(QA)+Omics:** Large gain from QA finetuning.
68
+ - **GAT / G-Retriever+pre-GAT:** Partial improvements, unstable.
69
+ - **+ Static KG:** Minimal or negative gains.
70
+ - **GALAX (QA + KG + RL):** Consistent cross-dataset gains (2–5% absolute).
71
+
72
+ ---
73
+
74
+
75
  ## πŸ“Š Results
76
 
77
  GALAX consistently outperforms baselines and ablation variants.
 
81
  - **Hit@10:** 0.8815
82
  - **Hit@5:** 0.9249
83
 
 
 
 
 
 
 
 
 
 
84
  **Table 1. Precision and Recall across datasets**
85
 
86
  | Model | Overall Precision ↑ | Overall Recall ↑ | LUAD Precision ↑ | LUAD Recall ↑ | BRCA Precision ↑ | BRCA Recall ↑ |
 
96
  | G-Retriever + pre-GAT | 0.4763 Β± 0.0004 | 0.3929 Β± 0.0063 | 0.4642 Β± 0.0181 | 0.3881 Β± 0.0264 | 0.4414 Β± 0.0099 | 0.3772 Β± 0.0010 |
97
  | **GALAX** | **0.5472 Β± 0.0053** | **0.5332 Β± 0.0031** | **0.5345 Β± 0.0185** | **0.5157 Β± 0.0043** | **0.5608 Β± 0.0031** | **0.5533 Β± 0.0033** |
98
 
 
 
99
  **Table 2. Hit@10 and Hit@5 across datasets**
100
 
101
  | Model | Overall Hit@10 ↑ | Overall Hit@5 ↑ | LUAD Hit@10 ↑ | LUAD Hit@5 ↑ | BRCA Hit@10 ↑ | BRCA Hit@5 ↑ |
 
113
 
114
  ---
115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  ## πŸ”¬ Intended Uses
117
 
118
  - **Research use only**
119
  - Target prioritization in **cancer biology**
120
+ - Benchmarking **graph-language foundation models** in target priorization
121
 
122
  ---
123
 
 
135
 
136
  ```bibtex
137
  @article{zhang2025galax,
138
+ title={GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine},
139
  author={Zhang, Heming and Li, Fuhai and Chen, Yixin and others},
140
  year={2025},
141
  journal={Preprint}