File size: 4,749 Bytes
446e14c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bda8d0e
 
 
 
 
 
446e14c
 
 
 
bda8d0e
 
446e14c
bda8d0e
 
 
 
 
 
 
 
 
 
 
 
 
446e14c
bda8d0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
446e14c
bda8d0e
 
 
 
 
 
 
 
 
 
446e14c
bda8d0e
 
 
 
 
446e14c
bda8d0e
 
 
446e14c
bda8d0e
 
 
446e14c
bda8d0e
446e14c
bda8d0e
446e14c
 
bda8d0e
 
 
446e14c
bda8d0e
446e14c
bda8d0e
 
 
446e14c
 
 
bda8d0e
 
446e14c
bda8d0e
446e14c
bda8d0e
446e14c
bda8d0e
446e14c
bda8d0e
446e14c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
license: mit
language:
- en
metrics:
- accuracy
- precision
pipeline_tag: image-classification
tags:
- biology
- cancer
- glioblastoma
- brain
- multimodal
- radiogenomics
- radiomics
- immune
- classifier
---
<p align="center"> <b> Predictive Radiomics for Evaluation of Cancer Immune SignaturE in Glioblastoma | PRECISE-GBM </b> </p>

<p align="center">
  <img src="PRECISE-GBM_GUI_logo%20(1).png" alt="PRECISE-GBM Logo">
</p>

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

This repository contains an AI-based training and retraining pipeline for Predictive Radiomics for Evaluation of Cancer Immune SignaturE in Glioblastoma (PRECISE-GBM). It is the multimodal radiogenomic framework that integrates MRI radiomics, genomics, and immune signatures for patient stratification.

<b> Project: PRECISE-GBM - Model training & retraining helpers </b>

## πŸ“ Overview

This repository contains code to train models (Gaussian Mixture labelling + SVM and ensemble classifiers) and to persist all artifacts required to reproduce or retrain models on new data. It includes:

- `Scenario_heldout_final_PRECISE.py` β€” training pipeline producing `.joblib` models and metadata JSONs (selected features, best params, CV results).
- `retrain_helper.py` β€” CLI utility to rebuild pipelines, set best params and retrain using saved selected-features and params JSONs. Supports JSON/YAML config files and auto-detection of model type.
- `README_RETRAIN.md` β€” detailed retrain examples and a notebook cell.

This repo also includes helper files to make it ready for GitHub:
- `requirements.txt` β€” Python dependencies
- `.gitignore` β€” recommended ignores (models, caches, logs)
- `LICENSE` β€” MIT license
- GitHub Actions workflow for CI (pytest smoke test)

## πŸ“ Getting started (Windows PowerShell)

1) Create and activate a virtual environment

```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
```

2) Install dependencies

```powershell
pip install --upgrade pip
pip install -r requirements.txt
```

3) Run training (note: the training script reads data from absolute paths configured in the script β€” adjust them or run from an environment where those files are present)

```powershell
python Scenario_heldout_final_PRECISE.py
```

The training script will create model files under `models_LM22/` and `models_GBM/` and write metadata JSONs next to each joblib model (selected features, params, cv results) as well as group-level JSON summaries.

## πŸ“ Retraining

See `README_RETRAIN.md` for detailed CLI and notebook examples. Short example:

```powershell
python retrain_helper.py \
  --model-prefix "models_GBM/scenario_1/GBM_scen1_Tcell" \
  --train-csv "data\new_train.csv" \
  --label-col "label"
```

## πŸ“Notes

- The training script contains hard-coded absolute paths to data files. Before running on another machine, update the `scenarios_*` file paths or place the datasets in the same paths.
- Retrain helper auto-detects model type when `--model-type` is omitted by looking for `{prefix}_svm_params.json` or `{prefix}_ens_params.json`.
- YAML config support for retrain requires PyYAML (`pip install pyyaml`).

## πŸ“ CI

A basic GitHub Actions workflow runs a smoke pytest to ensure the retrain helper imports and basic pipeline construction works. It does not run heavy training.

## πŸ“ Contributing

See `CONTRIBUTING.md` for guidance on opening issues and PRs.

## πŸ“ License

This project is released under the MIT License β€” see `LICENSE`. [MIT License](https://opensource.org/licenses/MIT).

## πŸ“ Citation
Please use the following citations when using the repository.

2025

> **Ghimire P, Modat M, Booth T**. *Predictive radiogenomic AI Model for patient stratification in brain tumor immunotherapy trials. Neuro-oncology.  Oct 2025; 26(Suppl_3): iii58–iii59. doi: https://doi.org/10.1093/neuonc/noaf193.188*

> **Ghimire P, Modat M, Booth T**. *Radiogenomic AI model predicts immune status in IDH wildtype glioblastoma: PRECISE-GBM study. RCR open. Jan 2025; 3(1): 100234*

2024

> **Ghimire P, Modat M, Booth T**. *A machine Learning bases predictive radiomics for evaluation of cancer immune signature in glioblastoma: the PRECISE-GBM study. Neuro-Oncology. Oct 2024; 26(suppl_5): v25.*

> **Ghimire P, Modat M, Booth T**. *A radiogenomic machine learning based study to identify Predictive Radiomics for Evaluation of Cancer Immune SignaturE in IDHw Glioblastoma. Neuro-Oncology. Oct 2024; 26(suppl_7): vii3*


**Contact**:

**Dr Prajwal Ghimire**

**MBBS MRCSEd MSc PhD'26**

School of Biomedical Engineering & Imaging Sciences, King's College London

Email: [prajwal.1.ghimire@kcl.ac.uk](mailto:prajwal.1.ghimire@kcl.ac.uk)