Can-SAVE / README.md
univanxx's picture
Update README.md
506ff30 verified
---
license: apache-2.0
---
# Can-SAVE: *Deploying Low-Cost and Population-Scale Cancer Screening via Survival Analysis Variables and EHR*
[![arXiv](https://img.shields.io/badge/arXiv-2309.15039-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2309.15039)
[![KDD 2026](https://img.shields.io/badge/KDD%202026-Accepted-2ea44f?logo=acm)](https://kdd2026.kdd.org/)
[![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg?logo=python)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
The source code to implement the feature engineering step of the Can-SAVE method.
## Installation
```bash
git clone https://huggingface.co/ai-lab/Can-SAVE
cd CanSave
pip install -r requirements.txt
```
## requirements.txt
```bash
pandas==1.5.3
numpy==1.23.2
lifelines==0.27.4
scikit-learn==1.1.3
scipy==1.10.0
PyYAML==6.0
openpyxl==3.0.10
```
## Repository Structure
- Can-SAVE/: Core implementation
- EHR/: Simulated sample of EHR data
- survival_models/: Output directory for fitted models (Kaplan-Meier estimators and AFT model)
```bash
Can-SAVE/
β”œβ”€β”€ EHR/
β”‚ └── id_26.csv
β”œβ”€β”€ survival_models/
β”‚ β”œβ”€β”€ kaplan_meier_both.pkl
β”‚ β”œβ”€β”€ kaplan_meier_males.pkl
β”‚ β”œβ”€β”€ kaplan_meier_females.pkl
β”‚ └── aft.pkl
β”œβ”€β”€ CanSave.py
β”œβ”€β”€ Example_How_To_Train_Survival_Models.py
β”œβ”€β”€ KaplanMeierEstimator.py
β”œβ”€β”€ CONFIG_CanSave.yaml
β”œβ”€β”€ icd10_groups.xlsx
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
└── README.md
```
## Quick Start
### 1) How to Train Survival Models
```bash
$ python Example_How_To_Train_Survival_Models.py
```
### 2) How to Do Feature Engineering for Can-SAVE
#### Terminal
```bash
$ python CanSave.py
```
#### Python
```python
# required libraries
import numpy as np
import pandas as pd
from CanSave import CanSave
# entry point
if __name__ == '__main__':
# Make new object for feature engineering
config_path = './CONFIG_CanSave.yaml'
cs = CanSave(CONFIG_PATH=config_path)
print(help(cs))
# Load the patient's EHR
path_ehr = './EHR/id_26.csv'
ehr = pd.read_csv(path_ehr, sep=';').set_index('patient_id')
sex = ehr['sex'].iloc[0]
birth_date = ehr['birth_date'].iloc[0]
# Make feature engineering for the risk prediction
features = cs.feature_engineering(
sex = sex, # sex of the patient
birth_date = birth_date, # birth date of the patient
ehr = ehr, # Electronic Health Records of the patient
date_pred = '2022-01-01', # date of the risk estimation
deep_weeks = 108 # deep of the EHR's history (in weeks)
)
```
## Citation
If you find the work useful, please cite our work:
```bibtex
@misc{philonenko2025,
title={Can-SAVE: Deploying Low-Cost and Population-Scale Cancer
Screening via Survival Analysis Variables and EHR},
author={Petr Philonenko and Vladimir Kokh and Pavel Blinov},
year={2025},
eprint={2309.15039},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2309.15039},
}
```