--- license: apache-2.0 --- # Can-SAVE: *Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR* [![arXiv](https://img.shields.io/badge/arXiv-2309.15039-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2309.15039) [![KDD 2026](https://img.shields.io/badge/KDD%202026-Accepted-2ea44f?logo=acm)](https://kdd2026.kdd.org/) [![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg?logo=python)](https://www.python.org/downloads/) [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE) The source code to implement the feature engineering step of the Can-SAVE method. ## Installation ```bash git clone https://huggingface.co/ai-lab/Can-SAVE cd CanSave pip install -r requirements.txt ``` ## requirements.txt ```bash pandas==1.5.3 numpy==1.23.2 lifelines==0.27.4 scikit-learn==1.1.3 scipy==1.10.0 PyYAML==6.0 openpyxl==3.0.10 ``` ## Repository Structure - Can-SAVE/: Core implementation - EHR/: Simulated sample of EHR data - survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model) ```bash Can-SAVE/ ├── EHR/ │ └── id_26.csv ├── survival_models/ │ ├── kaplan_meier_both.pkl │ ├── kaplan_meier_males.pkl │ ├── kaplan_meier_females.pkl │ └── aft.pkl ├── CanSave.py ├── Example_How_To_Train_Survival_Models.py ├── KaplanMeierEstimator.py ├── CONFIG_CanSave.yaml ├── icd10_groups.xlsx ├── requirements.txt ├── LICENSE └── README.md ``` ## Quick Start ### 1) How to Train Survival Models ```bash $ python Example_How_To_Train_Survival_Models.py ``` ### 2) How to Do Feature Engineering for Can-SAVE #### Terminal ```bash $ python CanSave.py ``` #### Python ```python # required libraries import numpy as np import pandas as pd from CanSave import CanSave # entry point if __name__ == '__main__': # Make new object for feature engineering config_path = './CONFIG_CanSave.yaml' cs = CanSave(CONFIG_PATH=config_path) print(help(cs)) # Load the patient's EHR path_ehr = './EHR/id_26.csv' ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id') sex = ehr['sex'].iloc[0] birth_date = ehr['birth_date'].iloc[0] # Make feature engineering for the risk prediction features = cs.feature_engineering( sex = sex, # sex of the patient birth_date = birth_date, # birth date of the patient ehr = ehr, # Electronic Health Records of the patient date_pred = '2022-01-01', # date of the risk estimation deep_weeks = 108 # deep of the EHR's history (in weeks) ) ``` ## Citation If you find the work useful, please cite our work: ```bibtex @misc{philonenko2025, title={Can-SAVE: Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR}, author={Petr Philonenko and Vladimir Kokh and Pavel Blinov}, year={2025}, eprint={2309.15039}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2309.15039}, } ```