|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# Can-SAVE: *Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR* |
|
|
|
|
|
[](https://arxiv.org/abs/2309.15039) |
|
|
[](https://kdd2026.kdd.org/) |
|
|
[](https://www.python.org/downloads/) |
|
|
[](LICENSE) |
|
|
|
|
|
The source code to implement the feature engineering step of the Can-SAVE method. |
|
|
|
|
|
## Installation |
|
|
```bash |
|
|
git clone https://huggingface.co/ai-lab/Can-SAVE |
|
|
cd CanSave |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
## requirements.txt |
|
|
```bash |
|
|
pandas==1.5.3 |
|
|
numpy==1.23.2 |
|
|
lifelines==0.27.4 |
|
|
scikit-learn==1.1.3 |
|
|
scipy==1.10.0 |
|
|
PyYAML==6.0 |
|
|
openpyxl==3.0.10 |
|
|
``` |
|
|
|
|
|
## Repository Structure |
|
|
- Can-SAVE/: Core implementation |
|
|
- EHR/: Simulated sample of EHR data |
|
|
- survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model) |
|
|
|
|
|
```bash |
|
|
Can-SAVE/ |
|
|
βββ EHR/ |
|
|
β βββ id_26.csv |
|
|
βββ survival_models/ |
|
|
β βββ kaplan_meier_both.pkl |
|
|
β βββ kaplan_meier_males.pkl |
|
|
β βββ kaplan_meier_females.pkl |
|
|
β βββ aft.pkl |
|
|
βββ CanSave.py |
|
|
βββ Example_How_To_Train_Survival_Models.py |
|
|
βββ KaplanMeierEstimator.py |
|
|
βββ CONFIG_CanSave.yaml |
|
|
βββ icd10_groups.xlsx |
|
|
βββ requirements.txt |
|
|
βββ LICENSE |
|
|
βββ README.md |
|
|
``` |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### 1) How to Train Survival Models |
|
|
```bash |
|
|
$ python Example_How_To_Train_Survival_Models.py |
|
|
``` |
|
|
|
|
|
### 2) How to Do Feature Engineering for Can-SAVE |
|
|
#### Terminal |
|
|
```bash |
|
|
$ python CanSave.py |
|
|
``` |
|
|
|
|
|
#### Python |
|
|
```python |
|
|
# required libraries |
|
|
import numpy as np |
|
|
import pandas as pd |
|
|
|
|
|
from CanSave import CanSave |
|
|
|
|
|
# entry point |
|
|
if __name__ == '__main__': |
|
|
# Make new object for feature engineering |
|
|
config_path = './CONFIG_CanSave.yaml' |
|
|
cs = CanSave(CONFIG_PATH=config_path) |
|
|
print(help(cs)) |
|
|
|
|
|
# Load the patient's EHR |
|
|
path_ehr = './EHR/id_26.csv' |
|
|
ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id') |
|
|
sex = ehr['sex'].iloc[0] |
|
|
birth_date = ehr['birth_date'].iloc[0] |
|
|
|
|
|
# Make feature engineering for the risk prediction |
|
|
features = cs.feature_engineering( |
|
|
sex = sex, # sex of the patient |
|
|
birth_date = birth_date, # birth date of the patient |
|
|
ehr = ehr, # Electronic Health Records of the patient |
|
|
date_pred = '2022-01-01', # date of the risk estimation |
|
|
deep_weeks = 108 # deep of the EHR's history (in weeks) |
|
|
) |
|
|
|
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find the work useful, please cite our work: |
|
|
|
|
|
```bibtex |
|
|
@misc{philonenko2025, |
|
|
title={Can-SAVE: Deploying Low-Cost and Population-Scale Cancer |
|
|
Screening via Survival Analysis Variables and EHR}, |
|
|
author={Petr Philonenko and Vladimir Kokh and Pavel Blinov}, |
|
|
year={2025}, |
|
|
eprint={2309.15039}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.LG}, |
|
|
url={https://arxiv.org/abs/2309.15039}, |
|
|
} |
|
|
``` |