IamGrooooot's picture
Model E: Unsupervised PCA + clustering risk stratification
53a6def
# Reduction
This folder contains scripts for combining, reducing, filling and scaling processed EHR data for modelling. Scripts should be run in the below order.
Note that scripts must be run in the below order:
1. `combine.py` - combine datasets and perform any post-processing
2. `post_prod_reduction.py` - Combine columns to reduce 0 values
3. `remove_ids.py` - remove receiver, scale up and test IDs
4. `clean_and_scale_train.py` - impute nulls and min-max scale training data
5. `clean_and_scale_test.py` - impute nulls and min-max scale testing data
_NB: The data_type in `clean_and_scale_test.py` can be changed to rec, sup, val and test._