ai-lab
/

Can-SAVE

Model card Files Files and versions

Can-SAVE / README.md

univanxx's picture

Update README.md

506ff30 verified 2 months ago

|

history blame contribute delete

3.2 kB

	---
	license: apache-2.0
	---
	# Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR

	[![arXiv](https://img.shields.io/badge/arXiv-2309.15039-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2309.15039)
	[![KDD 2026](https://img.shields.io/badge/KDD%202026-Accepted-2ea44f?logo=acm)](https://kdd2026.kdd.org/)
	[![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg?logo=python)](https://www.python.org/downloads/)
	[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

	The source code to implement the feature engineering step of the Can-SAVE method.

	## Installation
	```bash
	git clone https://huggingface.co/ai-lab/Can-SAVE
	cd CanSave
	pip install -r requirements.txt
	```

	## requirements.txt
	```bash
	pandas==1.5.3
	numpy==1.23.2
	lifelines==0.27.4
	scikit-learn==1.1.3
	scipy==1.10.0
	PyYAML==6.0
	openpyxl==3.0.10
	```

	## Repository Structure
	- Can-SAVE/: Core implementation
	- EHR/: Simulated sample of EHR data
	- survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model)

	```bash
	Can-SAVE/
	├── EHR/
	│ └── id_26.csv
	├── survival_models/
	│ ├── kaplan_meier_both.pkl
	│ ├── kaplan_meier_males.pkl
	│ ├── kaplan_meier_females.pkl
	│ └── aft.pkl
	├── CanSave.py
	├── Example_How_To_Train_Survival_Models.py
	├── KaplanMeierEstimator.py
	├── CONFIG_CanSave.yaml
	├── icd10_groups.xlsx
	├── requirements.txt
	├── LICENSE
	└── README.md
	```

	## Quick Start

	### 1) How to Train Survival Models
	```bash
	$ python Example_How_To_Train_Survival_Models.py
	```

	### 2) How to Do Feature Engineering for Can-SAVE
	#### Terminal
	```bash
	$ python CanSave.py
	```

	#### Python
	```python
	# required libraries
	import numpy as np
	import pandas as pd

	from CanSave import CanSave

	# entry point
	if __name__ == '__main__':
	# Make new object for feature engineering
	config_path = './CONFIG_CanSave.yaml'
	cs = CanSave(CONFIG_PATH=config_path)
	print(help(cs))

	# Load the patient's EHR
	path_ehr = './EHR/id_26.csv'
	ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id')
	sex = ehr['sex'].iloc[0]
	birth_date = ehr['birth_date'].iloc[0]

	# Make feature engineering for the risk prediction
	features = cs.feature_engineering(
	sex = sex, # sex of the patient
	birth_date = birth_date, # birth date of the patient
	ehr = ehr, # Electronic Health Records of the patient
	date_pred = '2022-01-01', # date of the risk estimation
	deep_weeks = 108 # deep of the EHR's history (in weeks)
	)

	```

	## Citation

	If you find the work useful, please cite our work:

	```bibtex
	@misc{philonenko2025,
	title={Can-SAVE: Deploying Low-Cost and Population-Scale Cancer
	Screening via Survival Analysis Variables and EHR},
	author={Petr Philonenko and Vladimir Kokh and Pavel Blinov},
	year={2025},
	eprint={2309.15039},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2309.15039},
	}
	```