GALAX / README.md

Update README.md

66a97d5 verified 4 months ago

9.19 kB

	---
	language: en
	license: mit
	tags:
	- graph-ml
	- bioinformatics
	- precision-medicine
	- explainable-ai
	- reinforcement-learning
	datasets:
	- FuhaiLiAiLab/Target-QA
	library_name: transformers
	pipeline_tag: text-generation
	model-index:
	- name: GALAX
	results:
	- task:
	type: text-generation
	name: Target Prioritization
	dataset:
	name: Target-QA
	type: FuhaiLiAiLab/Target-QA
	metrics:
	- type: precision
	value: 0.5472
	- type: recall
	value: 0.5332
	- type: hit@10
	value: 0.8815
	- type: hit@5
	value: 0.9249
	---

	# GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine

	<div align="center">
	<img src="https://github.com/FuhaiLiAiLab/GALAX/blob/main/Figures/GALAX-logo.png?raw=true" width="40%" alt="GALAX" />
	</div>

	<div align="center" style="line-height: 1;">
	<!-- GitHub -->
	<a href="https://github.com/FuhaiLiAiLab/GALAX" target="_blank" style="margin: 2px;">
	<img alt="GitHub" src="https://img.shields.io/badge/GitHub-GALAX%20Code-181717?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>

	<!-- Hugging Face Model -->
	<a href="https://huggingface.co/FuhaiLiAiLab/GALAX" target="_blank" style="margin: 2px;">
	<img alt="Hugging Face Model" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-GALAX%20Model-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>

	<!-- Hugging Face Dataset -->
	<a href="https://huggingface.co/datasets/FuhaiLiAiLab/Target-QA" target="_blank" style="margin: 2px;">
	<img alt="Hugging Face Dataset" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Target--QA%20Dataset-ff6f61?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>
	</div>

	<div align="center" style="line-height: 1;">
	<!-- arXiv -->
	<a href="https://arxiv.org/abs/2509.20935" target="_blank" style="margin: 2px;">
	<img alt="arXiv" src="https://img.shields.io/badge/arXiv-GALAX%20Paper-b31b1b?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>

	<!-- License -->
	<a href="LICENSE" style="margin: 2px;">
	<img alt="License" src="https://img.shields.io/badge/License-MIT-0a4d92?logo=open-source-initiative&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>
	</div>

	---

	## 🧩 Model Overview

	![GALAX Overall Architecture](https://github.com/FuhaiLiAiLab/GALAX/blob/main/Figures/Figure3.png?raw=true)

	GALAX is a graph-augmented language model designed for explainable target prioritization in precision medicine. It combines three key components:
	- LLaMA3-8B-Instruct as the language backbone, further adapted with the BioMedGraphica corpus and fine-tuned on Target-QA.
	- Graph Attention Network (GAT) pretrained on integrated multi-omics data and BioMedGraphica knowledge graphs.
	- A reinforcement-guided subgraph generator that enables interpretable reasoning by constructing biologically meaningful subgraphs from multi-omics and knowledge graph signals.

	By jointly leveraging multi-omics features, protein–protein interactions, and disease–target associations, GALAX provides an interpretable framework for CRISPR target prioritization across diverse cancer cell lines. To support benchmarking and reproducibility, we also introduce the [Target-QA dataset](https://huggingface.co/datasets/FuhaiLiAiLab/Target-QA).

	---

	## 🚀 How to Use

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from huggingface_hub import snapshot_download
	import os, torch

	# 1. Load GALAX language model
	model_id = "FuhaiLiAiLab/GALAX"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	lm_model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype="auto"
	)

	# 2. Access graph foundation model
	repo_path = snapshot_download(model_id)
	combined_model_path = os.path.join(repo_path, "best_combined_model.pt")
	device = "cuda" if torch.cuda.is_available() else "cpu"
	best_combined_model = torch.load(combined_model_path, map_location=device)
	```

	---

	## ⚙️ Experimental Setup

	- Backbone LM: LLaMA3-8B-Instruct (QA-tuned).
	- Graph Encoder: BioBERT-v1.1 embeddings + GAT with edge masking.
	- Training: Adam optimizer on 2× NVIDIA H100 (80GB).
	- Top features per omics modality: K = 10.
	- Subgraph rollout depth: L = 5, candidate nodes η = 20.
	- Evaluation: Precision, Recall, F1, Jaccard, Hit@5, Hit@10.

	---


	## 📊 Results

	GALAX consistently outperforms baselines and ablation variants.

	- Overall Precision: 0.5472
	- Overall Recall: 0.5332
	- Hit@10: 0.8815
	- Hit@5: 0.9249

	Table 1. Precision and Recall across datasets

	\| Model \| Overall Precision ↑ \| Overall Recall ↑ \| LUAD Precision ↑ \| LUAD Recall ↑ \| BRCA Precision ↑ \| BRCA Recall ↑ \|
	\|-------------------------\|---------------------\|------------------\|------------------\|---------------\|------------------\|---------------\|
	\| M2T \| 0.0016 \| 0.0011 \| 0.0020 \| 0.0014 \| 0.0000 \| 0.0000 \|
	\| GAT \| 0.0006 ± 0.0000 \| 0.0006 ± 0.0000 \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \| 0.0033 ± 0.0000 \| 0.0033 ± 0.0000 \|
	\| L3 + Omics \| 0.0071 ± 0.0032 \| 0.0013 ± 0.0002 \| 0.0079 ± 0.0137 \| 0.0005 ± 0.0008 \| 0.0020 ± 0.0035 \| 0.0017 ± 0.0029 \|
	\| L3 + Omics + KG \| 0.0125 ± 0.0032 \| 0.0029 ± 0.0003 \| 0.0014 ± 0.0025 \| 0.0010 ± 0.0017 \| 0.0073 ± 0.0068 \| 0.0033 ± 0.0029 \|
	\| L3-FT(Med) + Omics \| 0.0179 ± 0.0045 \| 0.0133 ± 0.0064 \| 0.0091 ± 0.0018 \| 0.0105 ± 0.0044 \| 0.0110 ± 0.0086 \| 0.0106 ± 0.0075 \|
	\| L3-FT(Med) + Omics + KG \| 0.0158 ± 0.0030 \| 0.0058 ± 0.0011 \| 0.0081 ± 0.0071 \| 0.0024 ± 0.0017 \| 0.0149 ± 0.0057 \| 0.0050 ± 0.0000 \|
	\| L3-FT(QA) + Omics \| 0.5250 ± 0.0282 \| 0.4959 ± 0.0435 \| 0.5201 ± 0.0408 \| 0.4905 ± 0.0532 \| 0.5074 ± 0.0498 \| 0.4856 ± 0.0570 \|
	\| L3-FT(QA) + Omics + KG \| 0.5185 ± 0.0240 \| 0.4908 ± 0.0402 \| 0.5214 ± 0.0242 \| 0.4952 ± 0.0432 \| 0.4856 ± 0.0395 \| 0.4656 ± 0.0436 \|
	\| G-Retriever + pre-GAT \| 0.4763 ± 0.0004 \| 0.3929 ± 0.0063 \| 0.4642 ± 0.0181 \| 0.3881 ± 0.0264 \| 0.4414 ± 0.0099 \| 0.3772 ± 0.0010 \|
	\| GALAX \| 0.5472 ± 0.0053 \| 0.5332 ± 0.0031 \| 0.5345 ± 0.0185 \| 0.5157 ± 0.0043 \| 0.5608 ± 0.0031 \| 0.5533 ± 0.0033 \|

	Table 2. Hit@10 and Hit@5 across datasets

	\| Model \| Overall Hit@10 ↑ \| Overall Hit@5 ↑ \| LUAD Hit@10 ↑ \| LUAD Hit@5 ↑ \| BRCA Hit@10 ↑ \| BRCA Hit@5 ↑ \|
	\|-------------------------\|------------------\|-----------------\|---------------\|--------------\|---------------\|--------------\|
	\| M2T \| 0.0029 \| 0.0000 \| 0.0000 \| 0.0000 \| 0.0000 \| 0.0000 \|
	\| GAT \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \|
	\| L3 + Omics \| 0.0021 ± 0.0037 \| 0.0032 ± 0.0055 \| 0.0048 ± 0.0082 \| 0.0095 ± 0.0165 \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \|
	\| L3 + Omics + KG \| 0.0122 ± 0.0033 \| 0.0085 ± 0.0037 \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \| 0.0056 ± 0.0096 \| 0.0111 ± 0.0192 \|
	\| L3-FT(Med) + Omics \| 0.0122 ± 0.0072 \| 0.0116 ± 0.0097 \| 0.0000 ± 0.0000 \| 0.0000 ± 0.0000 \| 0.0111 ± 0.0192 \| 0.0000 ± 0.0000 \|
	\| L3-FT(Med) + Omics + KG \| 0.0132 ± 0.0040 \| 0.0106 ± 0.0048 \| 0.0048 ± 0.0082 \| 0.0095 ± 0.0165 \| 0.0111 ± 0.0192 \| 0.0000 ± 0.0000 \|
	\| L3-FT(QA) + Omics \| 0.8693 ± 0.0157 \| 0.8889 ± 0.0168 \| 0.8667 ± 0.0218 \| 0.8476 ± 0.0165 \| 0.8389 ± 0.0096 \| 0.8889 ± 0.0509 \|
	\| L3-FT(QA) + Omics + KG \| 0.8529 ± 0.0153 \| 0.8794 ± 0.0114 \| 0.8048 ± 0.0541 \| 0.7905 ± 0.0436 \| 0.8222 ± 0.0347 \| 0.8778 ± 0.0192 \|
	\| G-Retriever + pre-GAT \| 0.8550 ± 0.0046 \| 0.8804 ± 0.0037 \| 0.8524 ± 0.0165 \| 0.8857 ± 0.0000 \| 0.8667 ± 0.0000 \| 0.8667 ± 0.0000 \|
	\| GALAX \| 0.8815 ± 0.0033 \| 0.9249 ± 0.0048 \| 0.8810 ± 0.0082 \| 0.9238 ± 0.0436 \| 0.8500 ± 0.0441 \| 0.8889 ± 0.0839 \|

	---

	## 🔬 Intended Uses

	- Research use only
	- Benchmarking graph-language foundation models in target priorization
	- Target prioritization in cancer biology

	---

	## 📜 Citation

	If you use this model, please cite:

	```bibtex
	@article{zhang2025galax,
	title = {GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine},
	author = {Zhang, Heming and Huang, Di and Li, Wenyu and Province, Michael and Chen, Yixin and Payne, Philip and Li, Fuhai},
	journal = {arXiv preprint arXiv:2509.20935},
	year = {2025},
	doi = {10.48550/arXiv.2509.20935},
	url = {https://arxiv.org/abs/2509.20935}
	}