cuongnx2001 commited on Jun 30, 2024

Commit

264b4c4

verified ·

1 Parent(s): fa68e3c

Upload 34 files

Browse files

Files changed (34) hide show

README.md +153 -0
configurations/__pycache__/fastai_configs.cpython-310.pyc +0 -0
configurations/__pycache__/fastai_configs.cpython-39.pyc +0 -0
configurations/fastai_configs.py +137 -0
configurations/wavelet_configs.py +17 -0
evaluation/Model_Evaluation.ipynb +0 -0
experiments/__pycache__/scp_experiment.cpython-310.pyc +0 -0
experiments/__pycache__/scp_experiment.cpython-39.pyc +0 -0
experiments/scp_experiment.py +227 -0
exploratory_data_analysis/AutoECG_EDA.ipynb +0 -0
main.py +30 -0
models/__pycache__/base_model.cpython-39.pyc +0 -0
models/__pycache__/basicconv1d.cpython-39.pyc +0 -0
models/__pycache__/fastaiModel.cpython-310.pyc +0 -0
models/__pycache__/fastaiModel.cpython-39.pyc +0 -0
models/__pycache__/inception1d.cpython-39.pyc +0 -0
models/__pycache__/resnet1d.cpython-39.pyc +0 -0
models/__pycache__/rnn1d.cpython-39.pyc +0 -0
models/__pycache__/wavelet.cpython-39.pyc +0 -0
models/__pycache__/xresnet1d.cpython-39.pyc +0 -0
models/base_model.py +10 -0
models/basicconv1d.py +240 -0
models/fastaiModel.py +513 -0
models/inception1d.py +137 -0
models/resnet1d.py +299 -0
models/rnn1d.py +67 -0
models/wavelet.py +158 -0
models/xresnet1d.py +239 -0
requirements.txt +23 -0
utilities/__pycache__/timeseries_utils.cpython-39.pyc +0 -0
utilities/__pycache__/utils.cpython-39.pyc +0 -0
utilities/stratify.py +173 -0
utilities/timeseries_utils.py +649 -0
utilities/utils.py +509 -0

README.md ADDED Viewed

	@@ -0,0 +1,153 @@

+# Automated ECG Interpretation
+[![Contributors][contributors-shield]][contributors-url]
+[![GitHub forks](https://img.shields.io/github/forks/AutoECG/Automated-ECG-Interpretation?color=lightgray&style=flat-square)](https://github.com/AutoECG/Automated-ECG-Interpretation/network)
+[![GitHub stars](https://img.shields.io/github/stars/AutoECG/Automated-ECG-Interpretation?color=yellow&style=flat-square)](https://github.com/AutoECG/Automated-ECG-Interpretation/stargazers)
+[![GitHub issues](https://img.shields.io/github/issues/AutoECG/Automated-ECG-Interpretation?color=red&style=flat-square)](https://github.com/AutoECG/Automated-ECG-Interpretation/issues)
+<br>
+<div align="center">
+<img src="https://user-images.githubusercontent.com/46399191/191921241-495090db-a088-46b6-bd09-0f7f21170b0a.png" height="350"/>
+</div>
+## Summary
+Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only based on the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures.
+## Data Description
+The [PTB-XL ECG dataset](https://physionet.org/content/ptb-xl/1.0.1/) is a large dataset of 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. In total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. Combined with the extensive annotation, this turns the dataset into a rich resource for training and evaluating automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements, and annotated signal properties.
+In general, the dataset is organized as follows:
+```
+ptbxl
+├── ptbxl_database.csv
+├── scp_statements.csv
+├── records100
+├── 00000
+│   │   ├── 00001_lr.dat
+│   │   ├── 00001_lr.hea
+│   │   ├── ...
+│   │   ├── 00999_lr.dat
+│   │   └── 00999_lr.hea
+│   ├── ...
+│   └── 21000
+│        ├── 21001_lr.dat
+│        ├── 21001_lr.hea
+│        ├── ...
+│        ├── 21837_lr.dat
+│        └── 21837_lr.hea
+└── records500
+   ├── 00000
+   │     ├── 00001_hr.dat
+   │     ├── 00001_hr.hea
+   │     ├── ...
+   │     ├── 00999_hr.dat
+   │     └── 00999_hr.hea
+   ├── ...
+   └── 21000
+          ├── 21001_hr.dat
+          ├── 21001_hr.hea
+          ├── ...
+          ├── 21837_hr.dat
+          └── 21837_hr.hea
+```
+The dataset comprises 21837 clinical 12-lead ECG records of 10 seconds length from 18885 patients, where 52% are male and 48% are female with ages covering the whole range from 0 to 95 years (median 62 and interquantile range of 22). The value of the dataset results from the comprehensive collection of many different co-occurring pathologies, but also from a large proportion of healthy control samples.
+| Records | Superclass | Description |
+|:---|:---|:---|
+9528 | NORM | Normal ECG |
+5486 | MI | Myocardial Infarction |
+5250 | STTC | ST/T Change |
+4907 | CD | Conduction Disturbance |
+2655 | HYP | Hypertrophy |
+The waveform files are stored in WaveForm DataBase (WFDB) format with 16-bit precision at a resolution of 1μV/LSB and a sampling frequency of 500Hz (records500/) beside downsampled versions of the waveform data at a sampling frequency of 100Hz (records100/).
+All relevant metadata is stored in ptbxldatabase.csv with one row per record identified by ecgid and it contains 28 columns.
+All information related to the used annotation scheme is stored in a dedicated scp_statements.csv that was enriched with mappings to other annotation standards.
+## Setup
+### Install dependencies
+Install the dependencies (wfdb, pytorch, torchvision, cudatoolkit, fastai, fastprogress) by creating a conda environment:
+    conda env create -f requirements.yml
+    conda activate autoecg_env
+### Get data
+Download the dataset (PTB-XL) via the follwing bash-script:
+    get_dataset.sh
+This script first downloads [PTB-XL from PhysioNet](https://physionet.org/content/ptb-xl/) and stores it in `data/ptbxl/`.
+## Usage
+    python main.py
+This will perform all experiments for inception1d.
+Depending on the executing environment, this will take up to several hours.
+Once finished, all trained models, predictions and results are stored in `output/`,
+where for each experiment a sub-folder is created each with `data/`, `models/` and `results/` sub-sub-folders.
+| Model | AUC &darr; | Experiment |
+|:---|:---|:---|
+| inception1d | 0.927(00) | All statements |
+| inception1d | 0.929(00) | Diagnostic statements |
+| inception1d | 0.926(00) | Diagnostic subclasses |
+| inception1d | 0.919(00) | Diagnostic superclasses |
+| inception1d | 0.883(00) | Form statements |
+| inception1d | 0.949(00) | Rhythm statements |
+### Download model and results
+We also provide a [compressed zip-archive](https://drive.google.com/drive/folders/17za6IanRm7rpb1ZGHLQ80mJvBj_53LXJ?usp=sharing) containing the `output` folder corresponding to our runs including trained model and predictions.
+## Results for Inception1d Model
+| Experiment name  | Accuracy |  Precision | Recall | F1_Score | Specificity |
+| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
+| All  | 0.9792  | 0.8949 | 0.1408 | 0.4824 | 0.9921 |
+| Diagnostic  | 0.9806 | 0.8440 | 0.1556 | 0.4746 | 0.9952 |
+| Sub-Diagnostic | 0.9660 | 0.8315 | 0.3021 | 0.5119 | 0.9887 |
+| Super-Diagnostic | 0.8847 | 0.7938 | 0.6757 | 0.7157 | 0.9251 |
+| Form | 0.9452 | 0.5619 | 0.1420 | 0.3843 | 0.9916 |
+| Rhythm | 0.9844 | 0.7676 | 0.4489 | 0.7290 | 0.9722 |
+For more evaluation (Confusion Matrix, ROC curve)  information and visualizations visit: [Model Evaluation](https://github.com/AutoECG/Automated-ECG-Interpretation/blob/main/evaluation/Model_Evaluation.ipynb)
+## Contribution
+Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
+If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
+Don't forget to give the project a star! Thanks again!
+1. [Fork the Project](https://github.com/AutoECG/Automated-ECG-Interpretation/fork)
+2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
+3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
+4. Push to the Branch (`git push origin feature/AmazingFeature`)
+5. Open a Pull Request
+## Future Works
+1. Model Deployment.
+2. Continue Preprocessing new ECG data from hospitals to test model reliability and accuracy.
+3. Figure out different parsing options for xml ecg files from different ECG machines versions.
+## Contact
+Feel free to reach out to us:
+- DM [Zaki Kurdya](https://twitter.com/ZakiKurdya)
+- DM [Zeina Saadeddin](https://twitter.com/jszeina)
+- DM [Salam Thabit](https://twitter.com/salamThabetDo)
+<!-- MARKDOWN LINKS -->
+[contributors-shield]: https://img.shields.io/github/contributors/AutoECG/Automated-ECG-Interpretation.svg?style=flat-square&color=blue
+[contributors-url]: https://github.com/AutoECG/Automated-ECG-Interpretation/graphs/contributors

configurations/__pycache__/fastai_configs.cpython-310.pyc ADDED Viewed

Binary file (3.67 kB). View file

configurations/__pycache__/fastai_configs.cpython-39.pyc ADDED Viewed

Binary file (3.66 kB). View file

configurations/fastai_configs.py ADDED Viewed

	@@ -0,0 +1,137 @@

+conf_fastai_resnet1d18 = {'model_name': 'fastai_resnet1d18', 'model_type': 'FastaiModel',
+                          'parameters': dict()}
+conf_fastai_resnet1d34 = {'model_name': 'fastai_resnet1d34', 'model_type': 'FastaiModel',
+                          'parameters': dict()}
+conf_fastai_resnet1d50 = {'model_name': 'fastai_resnet1d50', 'model_type': 'FastaiModel',
+                          'parameters': dict()}
+conf_fastai_resnet1d101 = {'model_name': 'fastai_resnet1d101', 'model_type': 'FastaiModel',
+                           'parameters': dict()}
+conf_fastai_resnet1d152 = {'model_name': 'fastai_resnet1d152', 'model_type': 'FastaiModel',
+                           'parameters': dict()}
+conf_fastai_resnet1d_wang = {'model_name': 'fastai_resnet1d_wang', 'model_type': 'FastaiModel',
+                             'parameters': dict()}
+conf_fastai_wrn1d_22 = {'model_name': 'fastai_wrn1d_22', 'model_type': 'FastaiModel',
+                        'parameters': dict()}
+conf_fastai_xresnet1d18 = {'model_name': 'fastai_xresnet1d18', 'model_type': 'FastaiModel',
+                           'parameters': dict()}
+conf_fastai_xresnet1d34 = {'model_name': 'fastai_xresnet1d34', 'model_type': 'FastaiModel',
+                           'parameters': dict()}
+conf_fastai_xresnet1d50 = {'model_name': 'fastai_xresnet1d50', 'model_type': 'FastaiModel',
+                           'parameters': dict()}
+# more xresnet50s
+conf_fastai_xresnet1d50_ep30 = {'model_name': 'fastai_xresnet1d50_ep30', 'model_type': 'FastaiModel',
+                                'parameters': dict(epochs=30)}
+conf_fastai_xresnet1d50_validloss_ep30 = {'model_name': 'fastai_xresnet1d50_validloss_ep30',
+                                          'model_type': 'FastaiModel',
+                                          'parameters': dict(early_stopping="valid_loss", epochs=30)}
+conf_fastai_xresnet1d50_macroauc_ep30 = {'model_name': 'fastai_xresnet1d50_macroauc_ep30', 'model_type': 'FastaiModel',
+                                         'parameters': dict(early_stopping="macro_auc", epochs=30)}
+conf_fastai_xresnet1d50_fmax_ep30 = {'model_name': 'fastai_xresnet1d50_fmax_ep30', 'model_type': 'FastaiModel',
+                                     'parameters': dict(early_stopping="fmax", epochs=30)}
+conf_fastai_xresnet1d50_ep50 = {'model_name': 'fastai_xresnet1d50_ep50', 'model_type': 'FastaiModel',
+                                'parameters': dict(epochs=50)}
+conf_fastai_xresnet1d50_validloss_ep50 = {'model_name': 'fastai_xresnet1d50_validloss_ep50',
+                                          'model_type': 'FastaiModel',
+                                          'parameters': dict(early_stopping="valid_loss", epochs=50)}
+conf_fastai_xresnet1d50_macroauc_ep50 = {'model_name': 'fastai_xresnet1d50_macroauc_ep50', 'model_type': 'FastaiModel',
+                                         'parameters': dict(early_stopping="macro_auc", epochs=50)}
+conf_fastai_xresnet1d50_fmax_ep50 = {'model_name': 'fastai_xresnet1d50_fmax_ep50', 'model_type': 'FastaiModel',
+                                     'parameters': dict(early_stopping="fmax", epochs=50)}
+conf_fastai_xresnet1d101 = {'model_name': 'fastai_xresnet1d101', 'model_type': 'FastaiModel',
+                            'parameters': dict()}
+conf_fastai_xresnet1d152 = {'model_name': 'fastai_xresnet1d152', 'model_type': 'FastaiModel',
+                            'parameters': dict()}
+conf_fastai_xresnet1d18_deep = {'model_name': 'fastai_xresnet1d18_deep', 'model_type': 'FastaiModel',
+                                'parameters': dict()}
+conf_fastai_xresnet1d34_deep = {'model_name': 'fastai_xresnet1d34_deep', 'model_type': 'FastaiModel',
+                                'parameters': dict()}
+conf_fastai_xresnet1d50_deep = {'model_name': 'fastai_xresnet1d50_deep', 'model_type': 'FastaiModel',
+                                'parameters': dict()}
+conf_fastai_xresnet1d18_deeper = {'model_name': 'fastai_xresnet1d18_deeper', 'model_type': 'FastaiModel',
+                                  'parameters': dict()}
+conf_fastai_xresnet1d34_deeper = {'model_name': 'fastai_xresnet1d34_deeper', 'model_type': 'FastaiModel',
+                                  'parameters': dict()}
+conf_fastai_xresnet1d50_deeper = {'model_name': 'fastai_xresnet1d50_deeper', 'model_type': 'FastaiModel',
+                                  'parameters': dict()}
+conf_fastai_inception1d = {'model_name': 'fastai_inception1d', 'model_type': 'FastaiModel',
+                           'parameters': dict()}
+conf_fastai_inception1d_input256 = {'model_name': 'fastai_inception1d_input256', 'model_type': 'FastaiModel',
+                                    'parameters': dict(input_size=256)}
+conf_fastai_inception1d_input512 = {'model_name': 'fastai_inception1d_input512', 'model_type': 'FastaiModel',
+                                    'parameters': dict(input_size=512)}
+conf_fastai_inception1d_input1000 = {'model_name': 'fastai_inception1d_input1000', 'model_type': 'FastaiModel',
+                                     'parameters': dict(input_size=1000)}
+conf_fastai_inception1d_no_residual = {'model_name': 'fastai_inception1d_no_residual', 'model_type': 'FastaiModel',
+                                       'parameters': dict()}
+conf_fastai_fcn = {'model_name': 'fastai_fcn', 'model_type': 'FastaiModel',
+                   'parameters': dict()}
+conf_fastai_fcn_wang = {'model_name': 'fastai_fcn_wang', 'model_type': 'FastaiModel',
+                        'parameters': dict()}
+conf_fastai_schirrmeister = {'model_name': 'fastai_schirrmeister', 'model_type': 'FastaiModel',
+                             'parameters': dict()}
+conf_fastai_sen = {'model_name': 'fastai_sen', 'model_type': 'FastaiModel',
+                   'parameters': dict()}
+conf_fastai_basic1d = {'model_name': 'fastai_basic1d', 'model_type': 'FastaiModel',
+                       'parameters': dict()}
+conf_fastai_lstm = {'model_name': 'fastai_lstm', 'model_type': 'FastaiModel',
+                    'parameters': dict(lr=1e-3)}
+conf_fastai_gru = {'model_name': 'fastai_gru', 'model_type': 'FastaiModel',
+                   'parameters': dict(lr=1e-3)}
+conf_fastai_lstm_bidir = {'model_name': 'fastai_lstm_bidir', 'model_type': 'FastaiModel',
+                          'parameters': dict(lr=1e-3)}
+conf_fastai_gru_bidir = {'model_name': 'fastai_gru', 'model_type': 'FastaiModel',
+                         'parameters': dict(lr=1e-3)}
+conf_fastai_lstm_input1000 = {'model_name': 'fastai_lstm_input1000', 'model_type': 'FastaiModel',
+                              'parameters': dict(input_size=1000, lr=1e-3)}
+conf_fastai_gru_input1000 = {'model_name': 'fastai_gru_input1000', 'model_type': 'FastaiModel',
+                             'parameters': dict(input_size=1000, lr=1e-3)}
+conf_fastai_schirrmeister_input500 = {'model_name': 'fastai_schirrmeister_input500', 'model_type': 'FastaiModel',
+                                      'parameters': dict(input_size=500)}
+conf_fastai_inception1d_input500 = {'model_name': 'fastai_inception1d_input500', 'model_type': 'FastaiModel',
+                                    'parameters': dict(input_size=500)}
+conf_fastai_fcn_wang_input500 = {'model_name': 'fastai_fcn_wang_input500', 'model_type': 'FastaiModel',
+                                 'parameters': dict(input_size=500)}

configurations/wavelet_configs.py ADDED Viewed

	@@ -0,0 +1,17 @@

+conf_wavelet_standard_lr = {'model_name': 'Wavelet+LR', 'model_type': 'WAVELET',
+                            'parameters': dict(
+                                regularizer_C=.001,
+                                classifier='LR'
+                            )}
+conf_wavelet_standard_rf = {'model_name': 'Wavelet+RF', 'model_type': 'WAVELET',
+                            'parameters': dict(
+                                regularizer_C=.001,
+                                classifier='RF'
+                            )}
+conf_wavelet_standard_nn = {'model_name': 'Wavelet+NN', 'model_type': 'WAVELET',
+                            'parameters': dict(
+                                regularizer_C=.001,
+                                classifier='NN'
+                            )}

evaluation/Model_Evaluation.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

experiments/__pycache__/scp_experiment.cpython-310.pyc ADDED Viewed

Binary file (5.81 kB). View file

experiments/__pycache__/scp_experiment.cpython-39.pyc ADDED Viewed

Binary file (5.85 kB). View file

experiments/scp_experiment.py ADDED Viewed

	@@ -0,0 +1,227 @@

+import multiprocessing
+from itertools import repeat
+from models import fastaiModel
+from models.wavelet import WaveletModel
+from utilities.utils import *
+class SCPExperiment:
+    """
+    Experiment on SCP-ECG statements.
+    All experiments based on SCP are performed and evaluated the same way.
+    """
+    def __init__(self, experiment_name, task, data_folder, output_folder, models,
+                 sampling_frequency=100, min_samples=0, train_fold=8, val_fold=9,
+                 test_fold=10, folds_type='strat'):
+        self.models = models
+        self.min_samples = min_samples
+        self.task = task
+        self.train_fold = train_fold
+        self.val_fold = val_fold
+        self.test_fold = test_fold
+        self.folds_type = folds_type
+        self.experiment_name = experiment_name
+        self.output_folder = output_folder
+        self.data_folder = data_folder
+        self.sampling_frequency = sampling_frequency
+        # create folder structure if needed
+        if not os.path.exists(self.output_folder + self.experiment_name):
+            os.makedirs(self.output_folder + self.experiment_name)
+            if not os.path.exists(self.output_folder + self.experiment_name + '/results/'):
+                os.makedirs(self.output_folder + self.experiment_name + '/results/')
+            if not os.path.exists(output_folder + self.experiment_name + '/models/'):
+                os.makedirs(self.output_folder + self.experiment_name + '/models/')
+            if not os.path.exists(output_folder + self.experiment_name + '/data/'):
+                os.makedirs(self.output_folder + self.experiment_name + '/data/')
+    def prepare(self):
+        # Load PTB-XL data
+        self.data, self.raw_labels = load_dataset(self.data_folder, self.sampling_frequency)
+        # Preprocess label data
+        self.labels = compute_label_aggregations(self.raw_labels, self.data_folder, self.task)
+        # Select relevant data and convert to one-hot
+        self.data, self.labels, self.Y, _ = select_data(self.data, self.labels, self.task, self.min_samples,
+                                                        self.output_folder + self.experiment_name + '/data/')
+        self.input_shape = self.data[0].shape
+        # 10th fold for testing (9th for now)
+        self.X_test = self.data[self.labels.strat_fold == self.test_fold]
+        self.y_test = self.Y[self.labels.strat_fold == self.test_fold]
+        # 9th fold for validation (8th for now)
+        self.X_val = self.data[self.labels.strat_fold == self.val_fold]
+        self.y_val = self.Y[self.labels.strat_fold == self.val_fold]
+        # rest for training
+        self.X_train = self.data[self.labels.strat_fold <= self.train_fold]
+        self.y_train = self.Y[self.labels.strat_fold <= self.train_fold]
+        # Preprocess signal data
+        self.X_train, self.X_val, self.X_test = preprocess_signals(self.X_train, self.X_val, self.X_test,
+                                                                   self.output_folder + self.experiment_name + '/data/')
+        self.n_classes = self.y_train.shape[1]
+        # save train and test labels
+        self.y_train.dump(self.output_folder + self.experiment_name + '/data/y_train.npy')
+        self.y_val.dump(self.output_folder + self.experiment_name + '/data/y_val.npy')
+        self.y_test.dump(self.output_folder + self.experiment_name + '/data/y_test.npy')
+        model_name = 'naive'
+        # create most naive predictions via simple mean in training
+        mpath = self.output_folder + self.experiment_name + '/models/' + model_name + '/'
+        # create folder for model outputs
+        if not os.path.exists(mpath):
+            os.makedirs(mpath)
+        if not os.path.exists(mpath + 'results/'):
+            os.makedirs(mpath + 'results/')
+        mean_y = np.mean(self.y_train, axis=0)
+        np.array([mean_y] * len(self.y_train)).dump(mpath + 'y_train_pred.npy')
+        np.array([mean_y] * len(self.y_test)).dump(mpath + 'y_test_pred.npy')
+        np.array([mean_y] * len(self.y_val)).dump(mpath + 'y_val_pred.npy')
+    def perform(self):
+        for model_description in self.models:
+            model_name = model_description['model_name']
+            model_type = model_description['model_type']
+            model_params = model_description['parameters']
+            mpath = self.output_folder + self.experiment_name + '/models/' + model_name + '/'
+            # create folder for model outputs
+            if not os.path.exists(mpath):
+                os.makedirs(mpath)
+            if not os.path.exists(mpath + 'results/'):
+                os.makedirs(mpath + 'results/')
+            n_classes = self.Y.shape[1]
+            # load respective model
+            if model_type == 'WAVELET':
+                model = WaveletModel(model_name, n_classes, self.sampling_frequency, mpath, self.input_shape,
+                                     **model_params)
+            elif model_type == "FastaiModel":
+                model = fastaiModel.FastaiModel(model_name, n_classes, self.sampling_frequency, mpath, self.input_shape,
+                                                **model_params)
+            else:
+                assert True
+                break
+            # Print to check
+            print("Shape of input", self.X_train.shape)
+            # fit model
+            model.fit(self.X_train, self.y_train, self.X_val, self.y_val)
+            # predict and dump
+            model.predict(self.X_train).dump(mpath + 'y_train_pred.npy')
+            model.predict(self.X_val).dump(mpath + 'y_val_pred.npy')
+            model.predict(self.X_test).dump(mpath + 'y_test_pred.npy')
+        model_name = 'ensemble'
+        # create ensemble predictions via simple mean across model predictions (except naive predictions)
+        ensemblepath = self.output_folder + self.experiment_name + '/models/' + model_name + '/'
+        # create folder for model outputs
+        if not os.path.exists(ensemblepath):
+            os.makedirs(ensemblepath)
+        if not os.path.exists(ensemblepath + 'results/'):
+            os.makedirs(ensemblepath + 'results/')
+        # load all predictions
+        ensemble_train, ensemble_val, ensemble_test = [], [], []
+        for model_description in os.listdir(self.output_folder + self.experiment_name + '/models/'):
+            if not model_description in ['ensemble', 'naive']:
+                mpath = self.output_folder + self.experiment_name + '/models/' + model_description + '/'
+                ensemble_train.append(np.load(mpath + 'y_train_pred.npy', allow_pickle=True))
+                ensemble_val.append(np.load(mpath + 'y_val_pred.npy', allow_pickle=True))
+                ensemble_test.append(np.load(mpath + 'y_test_pred.npy', allow_pickle=True))
+        # dump mean predictions
+        np.array(ensemble_train).mean(axis=0).dump(ensemblepath + 'y_train_pred.npy')
+        np.array(ensemble_test).mean(axis=0).dump(ensemblepath + 'y_test_pred.npy')
+        np.array(ensemble_val).mean(axis=0).dump(ensemblepath + 'y_val_pred.npy')
+    def evaluate(self, n_bootstraping_samples=100, n_jobs=20, bootstrap_eval=False, dumped_bootstraps=True):
+        # get labels
+        global train_samples, val_samples
+        y_train = np.load(self.output_folder + self.experiment_name + '/data/y_train.npy', allow_pickle=True)
+        y_val = np.load(self.output_folder + self.experiment_name + '/data/y_val.npy', allow_pickle=True)
+        y_test = np.load(self.output_folder + self.experiment_name + '/data/y_test.npy', allow_pickle=True)
+        # if bootstrapping then generate appropriate samples for each
+        if bootstrap_eval:
+            if not dumped_bootstraps:
+                train_samples = np.array(get_appropriate_bootstrap_samples(y_train, n_bootstraping_samples))
+                test_samples = np.array(get_appropriate_bootstrap_samples(y_test, n_bootstraping_samples))
+                val_samples = np.array(get_appropriate_bootstrap_samples(y_val, n_bootstraping_samples))
+            else:
+                test_samples = np.load(self.output_folder + self.experiment_name + '/test_bootstrap_ids.npy',
+                                       allow_pickle=True)
+        else:
+            train_samples = np.array([range(len(y_train))])
+            test_samples = np.array([range(len(y_test))])
+            val_samples = np.array([range(len(y_val))])
+        # store samples for future evaluations
+        train_samples.dump(self.output_folder + self.experiment_name + '/train_bootstrap_ids.npy')
+        test_samples.dump(self.output_folder + self.experiment_name + '/test_bootstrap_ids.npy')
+        val_samples.dump(self.output_folder + self.experiment_name + '/val_bootstrap_ids.npy')
+        # iterate over all models fitted so far
+        for m in sorted(os.listdir(self.output_folder + self.experiment_name + '/models')):
+            print(m)
+            mpath = self.output_folder + self.experiment_name + '/models/' + m + '/'
+            rpath = self.output_folder + self.experiment_name + '/models/' + m + '/results/'
+            # load predictions
+            y_train_pred = np.load(mpath + 'y_train_pred.npy', allow_pickle=True)
+            y_val_pred = np.load(mpath + 'y_val_pred.npy', allow_pickle=True)
+            y_test_pred = np.load(mpath + 'y_test_pred.npy', allow_pickle=True)
+            if self.experiment_name == 'exp_ICBEB':
+                # compute classwise thresholds such that recall-focused Gbeta is optimized
+                thresholds = find_optimal_cutoff_thresholds_for_Gbeta(y_train, y_train_pred)
+            else:
+                thresholds = None
+            pool = multiprocessing.Pool(n_jobs)
+            tr_df = pd.concat(pool.starmap(generate_results,
+                                           zip(train_samples, repeat(y_train), repeat(y_train_pred),
+                                               repeat(thresholds))))
+            tr_df_point = generate_results(range(len(y_train)), y_train, y_train_pred, thresholds)
+            tr_df_result = pd.DataFrame(
+                np.array([
+                    tr_df_point.mean().values,
+                    tr_df.mean().values,
+                    tr_df.quantile(0.05).values,
+                    tr_df.quantile(0.95).values]),
+                columns=tr_df.columns,
+                index=['point', 'mean', 'lower', 'upper'])
+            te_df = pd.concat(pool.starmap(generate_results,
+                                           zip(test_samples, repeat(y_test), repeat(y_test_pred), repeat(thresholds))))
+            te_df_point = generate_results(range(len(y_test)), y_test, y_test_pred, thresholds)
+            te_df_result = pd.DataFrame(
+                np.array([
+                    te_df_point.mean().values,
+                    te_df.mean().values,
+                    te_df.quantile(0.05).values,
+                    te_df.quantile(0.95).values]),
+                columns=te_df.columns,
+                index=['point', 'mean', 'lower', 'upper'])
+            val_df = pd.concat(pool.starmap(generate_results,
+                                            zip(val_samples, repeat(y_val), repeat(y_val_pred), repeat(thresholds))))
+            val_df_point = generate_results(range(len(y_val)), y_val, y_val_pred, thresholds)
+            val_df_result = pd.DataFrame(
+                np.array([
+                    val_df_point.mean().values,
+                    val_df.mean().values,
+                    val_df.quantile(0.05).values,
+                    val_df.quantile(0.95).values]),
+                columns=val_df.columns,
+                index=['point', 'mean', 'lower', 'upper'])
+            pool.close()
+            # dump results
+            tr_df_result.to_csv(rpath + 'tr_results.csv')
+            val_df_result.to_csv(rpath + 'val_results.csv')
+            te_df_result.to_csv(rpath + 'te_results.csv')

exploratory_data_analysis/AutoECG_EDA.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

main.py ADDED Viewed

	@@ -0,0 +1,30 @@

+# model configs
+from configurations.fastai_configs import conf_fastai_inception1d
+from experiments.scp_experiment import SCPExperiment
+from utilities.utils import generate_ptbxl_summary_table
+def main():
+    data_folder = 'data/ptbxl/'
+    output_folder = 'output/'
+    models = [conf_fastai_inception1d]
+    # STANDARD SCP EXPERIMENTS ON PTB-XL
+    experiments = [
+        ('exp1.1', 'subdiagnostic')
+    ]
+    for name, task in experiments:
+        e = SCPExperiment(name, task, data_folder, output_folder, models)
+        e.prepare()
+        e.perform()
+        e.evaluate()
+    # generate summary table
+    generate_ptbxl_summary_table()
+if __name__ == "__main__":
+    main()

models/__pycache__/base_model.cpython-39.pyc ADDED Viewed

Binary file (760 Bytes). View file

models/__pycache__/basicconv1d.cpython-39.pyc ADDED Viewed

Binary file (9.01 kB). View file

models/__pycache__/fastaiModel.cpython-310.pyc ADDED Viewed

Binary file (13.6 kB). View file

models/__pycache__/fastaiModel.cpython-39.pyc ADDED Viewed

Binary file (14.3 kB). View file

models/__pycache__/inception1d.cpython-39.pyc ADDED Viewed

Binary file (5.52 kB). View file

models/__pycache__/resnet1d.cpython-39.pyc ADDED Viewed

Binary file (9.51 kB). View file

models/__pycache__/rnn1d.cpython-39.pyc ADDED Viewed

Binary file (2.99 kB). View file

models/__pycache__/wavelet.cpython-39.pyc ADDED Viewed

Binary file (5.4 kB). View file

models/__pycache__/xresnet1d.cpython-39.pyc ADDED Viewed

Binary file (11.2 kB). View file

models/base_model.py ADDED Viewed

	@@ -0,0 +1,10 @@

+class ClassificationModel(object):
+    def __init__(self):
+        pass
+    def fit(self, X_train, y_train, X_val, y_val):
+        pass
+    def predict(self, X, full_sequence=True):
+        pass

models/basicconv1d.py ADDED Viewed

	@@ -0,0 +1,240 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+from fastai.layers import *
+from fastai.data.core import *
+from typing import Optional, Collection, Union
+from collections.abc import Iterable
+'''
+This layer creates a convolution kernel that is convolved with the layer input
+over a single spatial (or temporal) dimension to produce a tensor of outputs.
+If use_bias is True, a bias vector is created and added to the outputs.
+Finally, if activation is not None, it is applied to the outputs as well.
+https://keras.io/api/layers/convolution_layers/convolution1d/
+'''
+def listify(o):
+    if o is None: return []
+    if isinstance(o, list): return o
+    if isinstance(o, str): return [o]
+    if isinstance(o, Iterable): return list(o)
+    return [o]
+import torch.nn as nn
+def bn_drop_lin(ni, no, bn=True, p=0., actn=None):
+    layers = []
+    if bn: layers.append(nn.BatchNorm1d(ni))
+    if p != 0.: layers.append(nn.Dropout(p))
+    layers.append(nn.Linear(ni, no))
+    if actn is not None: layers.append(actn)
+    return layers
+def _conv1d(in_planes, out_planes, kernel_size=3, stride=1, dilation=1, act="relu", bn=True, drop_p=0):
+    lst = []
+    if (drop_p > 0):
+        lst.append(nn.Dropout(drop_p))
+    lst.append(nn.Conv1d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size - 1) // 2,
+                         dilation=dilation, bias=not bn))
+    if bn:
+        lst.append(nn.BatchNorm1d(out_planes))
+    if act == "relu":
+        lst.append(nn.ReLU(True))
+    if act == "elu":
+        lst.append(nn.ELU(True))
+    if act == "prelu":
+        lst.append(nn.PReLU(True))
+    return nn.Sequential(*lst)
+def _fc(in_planes, out_planes, act="relu", bn=True):
+    lst = [nn.Linear(in_planes, out_planes, bias=not (bn))]
+    if bn:
+        lst.append(nn.BatchNorm1d(out_planes))
+    if act == "relu":
+        lst.append(nn.ReLU(True))
+    if act == "elu":
+        lst.append(nn.ELU(True))
+    if act == "prelu":
+        lst.append(nn.PReLU(True))
+    return nn.Sequential(*lst)
+def cd_adaptive_concat_pool(relevant, irrelevant, module):
+    mpr, mpi = module.mp.attrib(relevant, irrelevant)
+    apr, api = module.ap.attrib(relevant, irrelevant)
+    return torch.cat([mpr, apr], 1), torch.cat([mpi, api], 1)
+def attrib_adaptive_concat_pool(self, relevant, irrelevant):
+    return cd_adaptive_concat_pool(relevant, irrelevant, self)
+class AdaptiveConcatPool1d(nn.Module):
+    """Layer that concat `AdaptiveAvgPool1d` and `AdaptiveMaxPool1d`."""
+    def __init__(self, sz: Optional[int] = None):
+        """Output will be 2*sz or 2 if sz is None"""
+        super().__init__()
+        sz = sz or 1
+        self.ap, self.mp = nn.AdaptiveAvgPool1d(sz), nn.AdaptiveMaxPool1d(sz)
+    def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)
+    def attrib(self, relevant, irrelevant):
+        return attrib_adaptive_concat_pool(self, relevant, irrelevant)
+class SqueezeExcite1d(nn.Module):
+    """squeeze excite block as used for example in LSTM FCN"""
+    def __init__(self, channels, reduction=16):
+        super().__init__()
+        channels_reduced = channels // reduction
+        self.w1 = torch.nn.Parameter(torch.randn(channels_reduced, channels).unsqueeze(0))
+        self.w2 = torch.nn.Parameter(torch.randn(channels, channels_reduced).unsqueeze(0))
+    def forward(self, x):
+        # input is bs,ch,seq
+        z = torch.mean(x, dim=2, keepdim=True)  # bs,ch
+        intermed = F.relu(torch.matmul(self.w1, z))  # (1,ch_red,ch * bs,ch,1) = (bs, ch_red, 1)
+        s = F.sigmoid(torch.matmul(self.w2, intermed))  # (1,ch,ch_red * bs, ch_red, 1=bs, ch, 1
+        return s * x  # bs,ch,seq * bs, ch,1 = bs,ch,seq
+def weight_init(m):
+    """call weight initialization for model n via n.apply(weight_init)"""
+    if isinstance(m, nn.Conv1d) or isinstance(m, nn.Linear):
+        nn.init.kaiming_normal_(m.weight)
+        if m.bias is not None:
+            nn.init.zeros_(m.bias)
+    if isinstance(m, nn.BatchNorm1d):
+        nn.init.constant_(m.weight, 1)
+        nn.init.constant_(m.bias, 0)
+    if isinstance(m, SqueezeExcite1d):
+        stdv1 = math.sqrt(2. / m.w1.size[0])
+        nn.init.normal_(m.w1, 0., stdv1)
+        stdv2 = math.sqrt(1. / m.w2.size[1])
+        nn.init.normal_(m.w2, 0., stdv2)
+def create_head1d(nf: int, nc: int, lin_ftrs: Optional[Collection[int]] = None, ps: Union[float, Collection[float]] = 0.5,
+                  bn_final: bool = False, bn: bool = True, act="relu", concat_pooling=True):
+    """Model head that takes `nf` features, runs through `lin_ftrs`, and about `nc` classes; added bn and act here"""
+    lin_ftrs = [2 * nf if concat_pooling else nf, nc] if lin_ftrs is None else [
+                                                                                   2 * nf if concat_pooling else nf] + lin_ftrs + [
+                                                                                   nc]  # was [nf, 512,nc]
+    ps = listify(ps)
+    if len(ps) == 1: ps = [ps[0] / 2] * (len(lin_ftrs) - 2) + ps
+    actns = [nn.ReLU(inplace=True) if act == "relu" else nn.ELU(inplace=True)] * (len(lin_ftrs) - 2) + [None]
+    layers = [AdaptiveConcatPool1d() if concat_pooling else nn.MaxPool1d(2), Flatten()]
+    for ni, no, p, actn in zip(lin_ftrs[:-1], lin_ftrs[1:], ps, actns):
+        layers += bn_drop_lin(ni, no, bn, p, actn)
+    if bn_final: layers.append(nn.BatchNorm1d(lin_ftrs[-1], momentum=0.01))
+    return nn.Sequential(*layers)
+# basic convolutional architecture
+class BasicConv1d(nn.Sequential):
+    """basic conv1d"""
+    def __init__(self, filters=None, kernel_size=3, stride=2, dilation=1, pool=0, pool_stride=1,
+                 squeeze_excite_reduction=0, num_classes=2, input_channels=8, act="relu", bn=True, headless=False,
+                 split_first_layer=False, drop_p=0., lin_ftrs_head=None, ps_head=0.5, bn_final_head=False, bn_head=True,
+                 act_head="relu", concat_pooling=True):
+        if filters is None:
+            filters = [128, 128, 128, 128]
+        layers = []
+        if isinstance(kernel_size, int):
+            kernel_size = [kernel_size] * len(filters)
+        for i in range(len(filters)):
+            layers_tmp = [_conv1d(input_channels if i == 0 else filters[i - 1], filters[i], kernel_size=kernel_size[i],
+                                  stride=(1 if (split_first_layer is True and i == 0) else stride), dilation=dilation,
+                                  act="none" if ((headless is True and i == len(filters) - 1) or (
+                                          split_first_layer is True and i == 0)) else act,
+                                  bn=False if (headless is True and i == len(filters) - 1) else bn,
+                                  drop_p=(0. if i == 0 else drop_p))]
+            if split_first_layer is True and i == 0:
+                layers_tmp.append(_conv1d(filters[0], filters[0], kernel_size=1, stride=1, act=act, bn=bn, drop_p=0.))
+                # layers_tmp.append(nn.Linear(filters[0],filters[0],bias=not(bn)))
+                # layers_tmp.append(_fc(filters[0],filters[0],act=act,bn=bn))
+            if pool > 0 and i < len(filters) - 1:
+                layers_tmp.append(nn.MaxPool1d(pool, stride=pool_stride, padding=(pool - 1) // 2))
+            if squeeze_excite_reduction > 0:
+                layers_tmp.append(SqueezeExcite1d(filters[i], squeeze_excite_reduction))
+            layers.append(nn.Sequential(*layers_tmp))
+        # head layers.append(nn.AdaptiveAvgPool1d(1)) layers.append(nn.Linear(filters[-1],num_classes)) head
+        # #inplace=True leads to a runtime error see ReLU+ dropout
+        # https://discuss.pytorch.org/t/relu-dropout-inplace/13467/5
+        self.headless = headless
+        if headless is True:
+            head = nn.Sequential(nn.AdaptiveAvgPool1d(1), Flatten())
+        else:
+            head = create_head1d(filters[-1], nc=num_classes, lin_ftrs=lin_ftrs_head, ps=ps_head,
+                                 bn_final=bn_final_head, bn=bn_head, act=act_head, concat_pooling=concat_pooling)
+        layers.append(head)
+        super().__init__(*layers)
+    def get_layer_groups(self):
+        return self[2], self[-1]
+    def get_output_layer(self):
+        if self.headless is False:
+            return self[-1][-1]
+        else:
+            return None
+    def set_output_layer(self, x):
+        if self.headless is False:
+            self[-1][-1] = x
+# convenience functions for basic convolutional architectures
+def fcn(filters=None, num_classes=2, input_channels=8):
+    if filters is None:
+        filters = [128] * 5
+    filters_in = filters + [num_classes]
+    return BasicConv1d(filters=filters_in, kernel_size=3, stride=1, pool=2, pool_stride=2,
+                       input_channels=input_channels, act="relu", bn=True, headless=True)
+def fcn_wang(num_classes=2, input_channels=8, lin_ftrs_head=None, ps_head=0.5, bn_final_head=False, bn_head=True,
+             act_head="relu", concat_pooling=True):
+    return BasicConv1d(filters=[128, 256, 128], kernel_size=[8, 5, 3], stride=1, pool=0, pool_stride=2,
+                       num_classes=num_classes, input_channels=input_channels, act="relu", bn=True,
+                       lin_ftrs_head=lin_ftrs_head, ps_head=ps_head, bn_final_head=bn_final_head, bn_head=bn_head,
+                       act_head=act_head, concat_pooling=concat_pooling)
+def schirrmeister(num_classes=2, input_channels=8, lin_ftrs_head=None, ps_head=0.5, bn_final_head=False, bn_head=True,
+                  act_head="relu", concat_pooling=True):
+    return BasicConv1d(filters=[25, 50, 100, 200], kernel_size=10, stride=3, pool=3, pool_stride=1,
+                       num_classes=num_classes, input_channels=input_channels, act="relu", bn=True, headless=False,
+                       split_first_layer=True, drop_p=0.5, lin_ftrs_head=lin_ftrs_head, ps_head=ps_head,
+                       bn_final_head=bn_final_head, bn_head=bn_head, act_head=act_head, concat_pooling=concat_pooling)
+def sen(filters=None, num_classes=2, input_channels=8, squeeze_excite_reduction=16, drop_p=0., lin_ftrs_head=None,
+        ps_head=0.5, bn_final_head=False, bn_head=True, act_head="relu", concat_pooling=True):
+    if filters is None:
+        filters = [128] * 5
+    return BasicConv1d(filters=filters, kernel_size=3, stride=2, pool=0, pool_stride=0, input_channels=input_channels,
+                       act="relu", bn=True, num_classes=num_classes, squeeze_excite_reduction=squeeze_excite_reduction,
+                       drop_p=drop_p, lin_ftrs_head=lin_ftrs_head, ps_head=ps_head, bn_final_head=bn_final_head,
+                       bn_head=bn_head, act_head=act_head, concat_pooling=concat_pooling)
+def basic1d(filters=None, kernel_size=3, stride=2, dilation=1, pool=0, pool_stride=1, squeeze_excite_reduction=0,
+            num_classes=2, input_channels=8, act="relu", bn=True, headless=False, drop_p=0., lin_ftrs_head=None,
+            ps_head=0.5, bn_final_head=False, bn_head=True, act_head="relu", concat_pooling=True):
+    if filters is None:
+        filters = [128] * 5
+    return BasicConv1d(filters=filters, kernel_size=kernel_size, stride=stride, dilation=dilation, pool=pool,
+                       pool_stride=pool_stride, squeeze_excite_reduction=squeeze_excite_reduction,
+                       num_classes=num_classes, input_channels=input_channels, act=act, bn=bn, headless=headless,
+                       drop_p=drop_p, lin_ftrs_head=lin_ftrs_head, ps_head=ps_head, bn_final_head=bn_final_head,
+                       bn_head=bn_head, act_head=act_head, concat_pooling=concat_pooling)

models/fastaiModel.py ADDED Viewed

	@@ -0,0 +1,513 @@

+from fastai.data.core import *
+from fastai.learner import *
+from fastai.callback.schedule import *
+from fastai.torch_core import *
+from fastai.callback.tracker import SaveModelCallback
+# from fastai.callback.gradient import GradientClipping
+from pathlib import Path
+from functools import partial
+import math
+# from fastai.callback import GradientClipping
+import torch
+from fastai.tabular.core import range_of
+import numpy as np
+import matplotlib
+import matplotlib.pyplot as plt
+from fastai.callback.core import Callback
+from fastai.data.core import DataLoaders
+import torch.nn.functional as F
+# from fastai.metrics import add_metrics
+import torch.nn as nn
+from fastcore.utils import ifnone
+import pandas as pd
+from models.base_model import ClassificationModel
+from models.basicconv1d import weight_init, fcn_wang, fcn, schirrmeister, sen, basic1d
+from models.inception1d import inception1d
+from models.resnet1d import resnet1d18, resnet1d34, resnet1d50, resnet1d101, resnet1d152, resnet1d_wang, \
+    wrn1d_22
+from models.rnn1d import RNN1d
+from utilities.timeseries_utils import TimeseriesDatasetCrops, ToTensor, aggregate_predictions
+from models.xresnet1d import xresnet1d18_deeper, xresnet1d34_deeper, xresnet1d50_deeper, xresnet1d18_deep, \
+    xresnet1d34_deep, xresnet1d50_deep, xresnet1d18, xresnet1d34, xresnet1d101, xresnet1d50, xresnet1d152
+from utilities.utils import evaluate_experiment
+def add_metrics(last_metrics, new_metric):
+    """
+    Adds a new metric to the list of last metrics.
+    Args:
+        last_metrics (list): List of previous metrics.
+        new_metric (float or list): New metric(s) to add.
+    Returns:
+        list: Updated list of metrics.
+    """
+    if isinstance(new_metric, list):
+        return last_metrics + new_metric
+    else:
+        return last_metrics + [new_metric]
+class MetricFunc(Callback):
+    """Obtains score using user-supplied function func (potentially ignoring targets with ignore_idx)"""
+    def __init__(self, func, name="MetricFunc", ignore_idx=None, one_hot_encode_target=True, argmax_pred=False,
+                 softmax_pred=True, flatten_target=True, sigmoid_pred=False, metric_component=None):
+        super().__init__()
+        self.metric_complete = self.func(self.y_true, self.y_pred)
+        self.y_true = None
+        self.y_pred = None
+        self.func = func
+        self.ignore_idx = ignore_idx
+        self.one_hot_encode_target = one_hot_encode_target
+        self.argmax_pred = argmax_pred
+        self.softmax_pred = softmax_pred
+        self.flatten_target = flatten_target
+        self.sigmoid_pred = sigmoid_pred
+        self.metric_component = metric_component
+        self.name = name
+    def on_epoch_begin(self, **kwargs):
+        pass
+    def on_batch_end(self, last_output, last_target, **kwargs):
+        # flatten everything (to make it also work for annotation tasks)
+        y_pred_flat = last_output.view((-1, last_output.size()[-1]))
+        if self.flatten_target:
+            last_target.view(-1)
+        y_true_flat = last_target
+        # optionally take argmax of predictions
+        if self.argmax_pred is True:
+            y_pred_flat = y_pred_flat.argmax(dim=1)
+        elif self.softmax_pred is True:
+            y_pred_flat = F.softmax(y_pred_flat, dim=1)
+        elif self.sigmoid_pred is True:
+            y_pred_flat = torch.sigmoid(y_pred_flat)
+        # potentially remove ignore_idx entries
+        if self.ignore_idx is not None:
+            selected_indices = (y_true_flat != self.ignore_idx).nonzero().squeeze()
+            y_pred_flat = y_pred_flat[selected_indices]
+            y_true_flat = y_true_flat[selected_indices]
+        y_pred_flat = to_np(y_pred_flat)
+        y_true_flat = to_np(y_true_flat)
+        if self.one_hot_encode_target is True:
+            y_true_flat = np.one_hot_np(y_true_flat, last_output.size()[-1])
+        if self.y_pred is None:
+            self.y_pred = y_pred_flat
+            self.y_true = y_true_flat
+        else:
+            self.y_pred = np.concatenate([self.y_pred, y_pred_flat], axis=0)
+            self.y_true = np.concatenate([self.y_true, y_true_flat], axis=0)
+    def on_epoch_end(self, last_metrics, **kwargs):
+        # access full metric (possibly multiple components) via self.metric_complete
+        if self.metric_component is not None:
+            return add_metrics(last_metrics, self.metric_complete[self.metric_component])
+        else:
+            return add_metrics(last_metrics, self.metric_complete)
+def fmax_metric(targs, preds):
+    return evaluate_experiment(targs, preds)["Fmax"]
+def auc_metric(targs, preds):
+    return evaluate_experiment(targs, preds)["macro_auc"]
+def mse_flat(preds, targs):
+    return torch.mean(torch.pow(preds.view(-1) - targs.view(-1), 2))
+def nll_regression(preds, targs):
+    # preds: bs, 2
+    # targs: bs, 1
+    preds_mean = preds[:, 0]
+    # warning: output goes through exponential map to ensure positivity
+    preds_var = torch.clamp(torch.exp(preds[:, 1]), 1e-4, 1e10)
+    # print(to_np(preds_mean)[0],to_np(targs)[0,0],to_np(torch.sqrt(preds_var))[0])
+    return torch.mean(torch.log(2 * math.pi * preds_var) / 2) + torch.mean(
+        torch.pow(preds_mean - targs[:, 0], 2) / 2 / preds_var)
+def nll_regression_init(m):
+    assert (isinstance(m, nn.Linear))
+    nn.init.normal_(m.weight, 0., 0.001)
+    nn.init.constant_(m.bias, 4)
+def lr_find_plot(learner, path, filename="lr_find", n_skip=10, n_skip_end=2):
+    """
+    saves lr_find plot as file (normally only jupyter output)
+    on the x-axis is lrs[-1]
+    """
+    learner.lr_find()
+    backend_old = matplotlib.get_backend()
+    plt.switch_backend('agg')
+    plt.ylabel("loss")
+    plt.xlabel("learning rate (log scale)")
+    losses = [to_np(x) for x in learner.recorder.losses[n_skip:-(n_skip_end + 1)]]
+    # print(learner.recorder.val_losses)
+    # val_losses = [ to_np(x) for x in learner.recorder.val_losses[n_skip:-(n_skip_end+1)]]
+    plt.plot(learner.recorder.lrs[n_skip:-(n_skip_end + 1)], losses)
+    # plt.plot(learner.recorder.lrs[n_skip:-(n_skip_end+1)],val_losses )
+    plt.xscale('log')
+    plt.savefig(str(path / (filename + '.png')))
+    plt.switch_backend(backend_old)
+def losses_plot(learner, path, filename="losses", last: int = None):
+    """
+    saves lr_find plot as file (normally only jupyter output)
+    on the x-axis is lrs[-1]
+    """
+    backend_old = matplotlib.get_backend()
+    plt.switch_backend('agg')
+    plt.ylabel("loss")
+    plt.xlabel("Batches processed")
+    last = ifnone(last, len(learner.recorder.nb_batches))
+    l_b = np.sum(learner.recorder.nb_batches[-last:])
+    iterations = range_of(learner.recorder.losses)[-l_b:]
+    plt.plot(iterations, learner.recorder.losses[-l_b:], label='Train')
+    val_iter = learner.recorder.nb_batches[-last:]
+    val_iter = np.cumsum(val_iter) + np.sum(learner.recorder.nb_batches[:-last])
+    plt.plot(val_iter, learner.recorder.val_losses[-last:], label='Validation')
+    plt.legend()
+    plt.savefig(str(path / (filename + '.png')))
+    plt.switch_backend(backend_old)
+class FastaiModel(ClassificationModel):
+    def __init__(self, name, n_classes, freq, output_folder, input_shape, pretrained=False, input_size=2.5,
+                 input_channels=12, chunkify_train=False, chunkify_valid=True, bs=128, ps_head=0.5, lin_ftrs_head=None,
+                 wd=1e-2, epochs=50, lr=1e-2, kernel_size=5, loss="binary_cross_entropy", pretrained_folder=None,
+                 n_classes_pretrained=None, gradual_unfreezing=True, discriminative_lrs=True, epochs_finetuning=30,
+                 early_stopping=None, aggregate_fn="max", concat_train_val=False):
+        super().__init__()
+        if lin_ftrs_head is None:
+            lin_ftrs_head = [128]
+        self.name = name
+        self.num_classes = n_classes if loss != "nll_regression" else 2
+        self.target_fs = freq
+        self.output_folder = Path(output_folder)
+        self.input_size = int(input_size * self.target_fs)
+        self.input_channels = input_channels
+        self.chunkify_train = chunkify_train
+        self.chunkify_valid = chunkify_valid
+        self.chunk_length_train = 2 * self.input_size  # target_fs*6
+        self.chunk_length_valid = self.input_size
+        self.min_chunk_length = self.input_size  # chunk_length
+        self.stride_length_train = self.input_size  # chunk_length_train//8
+        self.stride_length_valid = self.input_size // 2  # chunk_length_valid
+        self.copies_valid = 0  # >0 should only be used with chunkify_valid=False
+        self.bs = bs
+        self.ps_head = ps_head
+        self.lin_ftrs_head = lin_ftrs_head
+        self.wd = wd
+        self.epochs = epochs
+        self.lr = lr
+        self.kernel_size = kernel_size
+        self.loss = loss
+        self.input_shape = input_shape
+        if pretrained:
+            if pretrained_folder is None:
+                pretrained_folder = Path('../output/exp0/models/' + name.split("_pretrained")[0] + '/')
+                # pretrained_folder = Path('/output/exp0/models/'+name.split("_pretrained")[0]+'/')
+            if n_classes_pretrained is None:
+                n_classes_pretrained = 71
+        self.pretrained_folder = None if pretrained_folder is None else Path(pretrained_folder)
+        self.n_classes_pretrained = n_classes_pretrained
+        self.discriminative_lrs = discriminative_lrs
+        self.gradual_unfreezing = gradual_unfreezing
+        self.epochs_finetuning = epochs_finetuning
+        self.early_stopping = early_stopping
+        self.aggregate_fn = aggregate_fn
+        self.concat_train_val = concat_train_val
+    def fit(self, X_train, y_train, X_val, y_val):
+        # convert everything to float32
+        X_train = [l.astype(np.float32) for l in X_train]
+        X_val = [l.astype(np.float32) for l in X_val]
+        y_train = [l.astype(np.float32) for l in y_train]
+        y_val = [l.astype(np.float32) for l in y_val]
+        if self.concat_train_val:
+            X_train += X_val
+            y_train += y_val
+        if self.pretrained_folder is None:  # from scratch
+            print("Training from scratch...")
+            learn = self._get_learner(X_train, y_train, X_val, y_val)
+            # if(self.discriminative_lrs):
+            #    layer_groups=learn.model.get_layer_groups()
+            #    learn.split(layer_groups)
+            learn.model.apply(weight_init)
+            # initialization for regression output
+            if self.loss == "nll_regression" or self.loss == "mse":
+                output_layer_new = learn.model.get_output_layer()
+                output_layer_new.apply(nll_regression_init)
+                learn.model.set_output_layer(output_layer_new)
+            lr_find_plot(learn, self.output_folder)
+            learn.fit_one_cycle(self.epochs, self.lr)  # slice(self.lr) if self.discriminative_lrs else self.lr)
+            losses_plot(learn, self.output_folder)
+        else:  # finetuning
+            print("Finetuning...")
+            # create learner
+            learn = self._get_learner(X_train, y_train, X_val, y_val, self.n_classes_pretrained)
+            # load pretrained model
+            learn.path = self.pretrained_folder
+            learn.load(self.pretrained_folder.stem)
+            learn.path = self.output_folder
+            # exchange top layer
+            output_layer = learn.model.get_output_layer()
+            output_layer_new = nn.Linear(output_layer.in_features, self.num_classes).cuda()
+            apply_init(output_layer_new, nn.init.kaiming_normal_)
+            learn.model.set_output_layer(output_layer_new)
+            # layer groups
+            if self.discriminative_lrs:
+                layer_groups = learn.model.get_layer_groups()
+                learn.split(layer_groups)
+            learn.train_bn = True  # make sure if bn mode is train
+            # train
+            lr = self.lr
+            if self.gradual_unfreezing:
+                assert (self.discriminative_lrs is True)
+                learn.freeze()
+                lr_find_plot(learn, self.output_folder, "lr_find0")
+                learn.fit_one_cycle(self.epochs_finetuning, lr)
+                losses_plot(learn, self.output_folder, "losses0")
+                # for n in [0]:#range(len(layer_groups)): learn.freeze_to(-n-1) lr_find_plot(learn,
+                # self.output_folder,"lr_find"+str(n)) learn.fit_one_cycle(self.epochs_gradual_unfreezing,slice(lr))
+                # losses_plot(learn, self.output_folder,"losses"+str(n)) if(n==0):#reduce lr after first step lr/=10.
+                # if(n>0 and (self.name.startswith("fastai_lstm") or self.name.startswith("fastai_gru"))):#reduce lr
+                # further for RNNs lr/=10
+            learn.unfreeze()
+            lr_find_plot(learn, self.output_folder, "lr_find" + str(len(layer_groups)))
+            learn.fit_one_cycle(self.epochs_finetuning, slice(lr / 1000, lr / 10))
+            losses_plot(learn, self.output_folder, "losses" + str(len(layer_groups)))
+        learn.save(self.name)  # even for early stopping the best model will have been loaded again
+    def predict(self, X):
+        X = [l.astype(np.float32) for l in X]
+        y_dummy = [np.ones(self.num_classes, dtype=np.float32) for _ in range(len(X))]
+        learn = self._get_learner(X, y_dummy, X, y_dummy)
+        learn.load(self.name)
+        preds, targs = learn.get_preds()
+        preds = to_np(preds)
+        idmap = learn.data.valid_ds.get_id_mapping()
+        return aggregate_predictions(preds, idmap=idmap,
+                                     aggregate_fn=np.mean if self.aggregate_fn == "mean" else np.amax)
+    def _get_learner(self, X_train, y_train, X_val, y_val, num_classes=None):
+        df_train = pd.DataFrame({"data": range(len(X_train)), "label": y_train})
+        df_valid = pd.DataFrame({"data": range(len(X_val)), "label": y_val})
+        tfms_ptb_xl = [ToTensor()]
+        ds_train = TimeseriesDatasetCrops(df_train, self.input_size, num_classes=self.num_classes,
+                                          chunk_length=self.chunk_length_train if self.chunkify_train else 0,
+                                          min_chunk_length=self.min_chunk_length,
+                                          stride=self.stride_length_train, transforms=tfms_ptb_xl,
+                                          annotation=False, col_lbl="label", npy_data=X_train)
+        ds_valid = TimeseriesDatasetCrops(df_valid, self.input_size, num_classes=self.num_classes,
+                                          chunk_length=self.chunk_length_valid if self.chunkify_valid else 0,
+                                          min_chunk_length=self.min_chunk_length,
+                                          stride=self.stride_length_valid, transforms=tfms_ptb_xl,
+                                          annotation=False, col_lbl="label", npy_data=X_val)
+        db = DataLoaders(ds_train, ds_valid)
+        if self.loss == "binary_cross_entropy":
+            loss = F.binary_cross_entropy_with_logits
+        elif self.loss == "cross_entropy":
+            loss = F.cross_entropy
+        elif self.loss == "mse":
+            loss = mse_flat
+        elif self.loss == "nll_regression":
+            loss = nll_regression
+        else:
+            print("loss not found")
+            assert (True)
+        self.input_channels = self.input_shape[-1]
+        metrics = []
+        print("model:", self.name)
+        # note: all models of a particular kind share the same prefix but potentially a different
+        # postfix such as _input256
+        num_classes = self.num_classes if num_classes is None else num_classes
+        # resnet resnet1d18,resnet1d34,resnet1d50,resnet1d101,resnet1d152,resnet1d_wang,resnet1d,wrn1d_22
+        if self.name.startswith("fastai_resnet1d18"):
+            model = resnet1d18(num_classes=num_classes, input_channels=self.input_channels, inplanes=128,
+                               kernel_size=self.kernel_size, ps_head=self.ps_head,
+                               lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_resnet1d34"):
+            model = resnet1d34(num_classes=num_classes, input_channels=self.input_channels, inplanes=128,
+                               kernel_size=self.kernel_size, ps_head=self.ps_head,
+                               lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_resnet1d50"):
+            model = resnet1d50(num_classes=num_classes, input_channels=self.input_channels, inplanes=128,
+                               kernel_size=self.kernel_size, ps_head=self.ps_head,
+                               lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_resnet1d101"):
+            model = resnet1d101(num_classes=num_classes, input_channels=self.input_channels, inplanes=128,
+                                kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_resnet1d152"):
+            model = resnet1d152(num_classes=num_classes, input_channels=self.input_channels, inplanes=128,
+                                kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_resnet1d_wang"):
+            model = resnet1d_wang(num_classes=num_classes, input_channels=self.input_channels,
+                                  kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                  lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_wrn1d_22"):
+            model = wrn1d_22(num_classes=num_classes, input_channels=self.input_channels,
+                             kernel_size=self.kernel_size, ps_head=self.ps_head,
+                             lin_ftrs_head=self.lin_ftrs_head)
+        # xresnet ... (order important for string capture)
+        elif self.name.startswith("fastai_xresnet1d18_deeper"):
+            model = xresnet1d18_deeper(num_classes=num_classes, input_channels=self.input_channels,
+                                       kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                       lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d34_deeper"):
+            model = xresnet1d34_deeper(num_classes=num_classes, input_channels=self.input_channels,
+                                       kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                       lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d50_deeper"):
+            model = xresnet1d50_deeper(num_classes=num_classes, input_channels=self.input_channels,
+                                       kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                       lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d18_deep"):
+            model = xresnet1d18_deep(num_classes=num_classes, input_channels=self.input_channels,
+                                     kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                     lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d34_deep"):
+            model = xresnet1d34_deep(num_classes=num_classes, input_channels=self.input_channels,
+                                     kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                     lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d50_deep"):
+            model = xresnet1d50_deep(num_classes=num_classes, input_channels=self.input_channels,
+                                     kernel_size=self.kernel_size, ps_head=self.ps_head,
+                                     lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d18"):
+            model = xresnet1d18(num_classes=num_classes, input_channels=self.input_channels,
+                                kernel_size=self.kernel_size, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d34"):
+            model = xresnet1d34(num_classes=num_classes, input_channels=self.input_channels,
+                                kernel_size=self.kernel_size, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d50"):
+            model = xresnet1d50(num_classes=num_classes, input_channels=self.input_channels,
+                                kernel_size=self.kernel_size, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d101"):
+            model = xresnet1d101(num_classes=num_classes, input_channels=self.input_channels,
+                                 kernel_size=self.kernel_size, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_xresnet1d152"):
+            model = xresnet1d152(num_classes=num_classes, input_channels=self.input_channels,
+                                 kernel_size=self.kernel_size, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        # inception passing the default kernel size of 5 leads to a max kernel size of 40-1 in the inception model as
+        # proposed in the original paper
+        elif self.name == "fastai_inception1d_no_residual":  # note: order important for string capture
+            model = inception1d(num_classes=num_classes, input_channels=self.input_channels,
+                                use_residual=False, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head,
+                                kernel_size=8 * self.kernel_size)
+        elif self.name.startswith("fastai_inception1d"):
+            model = inception1d(num_classes=num_classes, input_channels=self.input_channels,
+                                use_residual=True, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head,
+                                kernel_size=8 * self.kernel_size)
+        # BasicConv1d fcn,fcn_wang,schirrmeister,sen,basic1d
+        elif self.name.startswith("fastai_fcn_wang"):  # note: order important for string capture
+            model = fcn_wang(num_classes=num_classes, input_channels=self.input_channels,
+                             ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_fcn"):
+            model = fcn(num_classes=num_classes, input_channels=self.input_channels)
+        elif self.name.startswith("fastai_schirrmeister"):
+            model = schirrmeister(num_classes=num_classes, input_channels=self.input_channels,
+                                  ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_sen"):
+            model = sen(num_classes=num_classes, input_channels=self.input_channels, ps_head=self.ps_head,
+                        lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_basic1d"):
+            model = basic1d(num_classes=num_classes, input_channels=self.input_channels,
+                            kernel_size=self.kernel_size, ps_head=self.ps_head,
+                            lin_ftrs_head=self.lin_ftrs_head)
+        # RNN
+        elif self.name.startswith("fastai_lstm_bidir"):
+            model = RNN1d(input_channels=self.input_channels, num_classes=num_classes, lstm=True,
+                          bidirectional=True, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_gru_bidir"):
+            model = RNN1d(input_channels=self.input_channels, num_classes=num_classes, lstm=False,
+                          bidirectional=True, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_lstm"):
+            model = RNN1d(input_channels=self.input_channels, num_classes=num_classes, lstm=True,
+                          bidirectional=False, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        elif self.name.startswith("fastai_gru"):
+            model = RNN1d(input_channels=self.input_channels, num_classes=num_classes, lstm=False,
+                          bidirectional=False, ps_head=self.ps_head, lin_ftrs_head=self.lin_ftrs_head)
+        else:
+            print("Model not found.")
+            assert True
+        learn = Learner(db, model, loss_func=loss, metrics=metrics, wd=self.wd, path=self.output_folder)
+        if self.name.startswith("fastai_lstm") or self.name.startswith("fastai_gru"):
+            learn.callback_fns.append(partial(GradientClipping, clip=0.25))
+        if self.early_stopping is not None:
+            # supported options: valid_loss, macro_auc, fmax
+            if self.early_stopping == "macro_auc" and self.loss != "mse" and self.loss != "nll_regression":
+                metric = MetricFunc(auc_metric, self.early_stopping,
+                                    one_hot_encode_target=False, argmax_pred=False, softmax_pred=False,
+                                    sigmoid_pred=True, flatten_target=False)
+                learn.metrics.append(metric)
+                learn.callback_fns.append(
+                    partial(SaveModelCallback, monitor=self.early_stopping, every='improvement', name=self.name))
+            elif self.early_stopping == "fmax" and self.loss != "mse" and self.loss != "nll_regression":
+                metric = MetricFunc(fmax_metric, self.early_stopping,
+                                    one_hot_encode_target=False, argmax_pred=False, softmax_pred=False,
+                                    sigmoid_pred=True, flatten_target=False)
+                learn.metrics.append(metric)
+                learn.callback_fns.append(partial(SaveModelCallback, monitor=self.early_stopping, every='improvement', name=self.name))
+            elif self.early_stopping == "valid_loss":
+                learn.callback_fns.append(partial(SaveModelCallback, monitor=self.early_stopping, every='improvement', name=self.name))
+        return learn

models/inception1d.py ADDED Viewed

	@@ -0,0 +1,137 @@

+import torch
+import torch.nn as nn
+from fastai.data.core import *
+# Inception time inspired by https://github.com/hfawaz/InceptionTime/blob/master/classifiers/inception.py and https://github.com/tcapelle/TimeSeries_fastai/blob/master/inception.py
+from models.basicconv1d import create_head1d
+def conv(in_planes, out_planes, kernel_size=3, stride=1):
+    "convolution with padding"
+    return nn.Conv1d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
+                     padding=(kernel_size - 1) // 2, bias=False)
+def noop(x): return x
+# class InceptionBlock1d(nn.Module):
+#     def __init__(self, ni, nb_filters, kss, stride=1, act='linear', bottleneck_size=32):
+#         super().__init__()
+#         self.bottleneck = conv(ni, bottleneck_size, 1, stride) if (bottleneck_size > 0) else noop
+#         self.convs = nn.ModuleList(
+#             [conv(bottleneck_size if (bottleneck_size > 0) else ni, nb_filters, ks) for ks in kss])
+#         self.conv_bottle = nn.Sequential(nn.MaxPool1d(3, stride, padding=1), conv(ni, nb_filters, 1))
+#         self.bn_relu = nn.Sequential(nn.BatchNorm1d((len(kss) + 1) * nb_filters), nn.ReLU())
+#     def forward(self, x):
+#         # print("block in",x.size())
+#         bottled = self.bottleneck(x)
+#         out = self.bn_relu(torch.cat([c(bottled) for c in self.convs] + [self.conv_bottle(x)], dim=1))
+#         return out
+class InceptionBlock1d(nn.Module):
+    def __init__(self, ni, nb_filters, kss, stride=1, act='linear', bottleneck_size=32):
+        super().__init__()
+        self.bottleneck = conv(ni, bottleneck_size, 1, stride) if (bottleneck_size > 0) else noop
+        self.convs = nn.ModuleList(
+            [conv(bottleneck_size if (bottleneck_size > 0) else ni, nb_filters, ks) for ks in kss])
+        self.conv_bottle = nn.Sequential(nn.MaxPool1d(3, stride, padding=1), conv(ni, nb_filters, 1))
+        self.bn_relu = nn.Sequential(nn.BatchNorm1d((len(kss) + 1) * nb_filters), nn.ReLU())
+    def forward(self, x):
+        bottled = self.bottleneck(x)
+        conv_outputs = [c(bottled) for c in self.convs]
+        bottle_output = self.conv_bottle(x)
+        out = self.bn_relu(torch.cat(conv_outputs + [bottle_output], dim=1))
+        return out
+class Shortcut1d(nn.Module):
+    def __init__(self, ni, nf):
+        super().__init__()
+        self.act_fn = nn.ReLU(True)
+        self.conv = conv(ni, nf, 1)
+        self.bn = nn.BatchNorm1d(nf)
+    def forward(self, inp, out):
+        # print("sk",out.size(), inp.size(), self.conv(inp).size(), self.bn(self.conv(inp)).size)
+        # input()
+        return self.act_fn(out + self.bn(self.conv(inp)))
+class InceptionBackbone(nn.Module):
+    def __init__(self, input_channels, kss, depth, bottleneck_size, nb_filters, use_residual):
+        super().__init__()
+        self.depth = depth
+        assert ((depth % 3) == 0)
+        self.use_residual = use_residual
+        n_ks = len(kss) + 1
+        self.im = nn.ModuleList([InceptionBlock1d(input_channels if d == 0 else n_ks * nb_filters,
+                                                  nb_filters=nb_filters, kss=kss,
+                                                  bottleneck_size=bottleneck_size) for d in range(depth)])
+        self.sk = nn.ModuleList(
+            [Shortcut1d(input_channels if d == 0 else n_ks * nb_filters, n_ks * nb_filters) for d in
+             range(depth // 3)])
+    def forward(self, x):
+        input_res = x
+        for d in range(self.depth):
+            x = self.im[d](x)
+            if self.use_residual and d % 3 == 2:
+                x = (self.sk[d // 3])(input_res, x)
+                input_res = x.clone()
+        return x
+class Inception1d(nn.Module):
+    """inception time architecture"""
+    def __init__(self, num_classes=2, input_channels=8, kernel_size=40, depth=6, bottleneck_size=32, nb_filters=32,
+                 use_residual=True, lin_ftrs_head=None, ps_head=0.5, bn_final_head=False, bn_head=True, act_head="relu",
+                 concat_pooling=True):
+        super().__init__()
+        assert (kernel_size >= 40)
+        kernel_size = [k - 1 if k % 2 == 0 else k for k in
+                       [kernel_size, kernel_size // 2, kernel_size // 4]]  # was 39,19,9
+        layers = [InceptionBackbone(input_channels=input_channels, kss=kernel_size, depth=depth,
+                                    bottleneck_size=bottleneck_size, nb_filters=nb_filters,
+                                    use_residual=use_residual)]
+        n_ks = len(kernel_size) + 1
+        # head
+        head = create_head1d(n_ks * nb_filters, nc=num_classes, lin_ftrs=lin_ftrs_head, ps=ps_head,
+                             bn_final=bn_final_head, bn=bn_head, act=act_head,
+                             concat_pooling=concat_pooling)
+        layers.append(head)
+        # layers.append(AdaptiveConcatPool1d())
+        # layers.append(Flatten())
+        # layers.append(nn.Linear(2*n_ks*nb_filters, num_classes))
+        self.layers = nn.Sequential(*layers)
+    def forward(self, x):
+        return self.layers(x)
+    def get_layer_groups(self):
+        depth = self.layers[0].depth
+        if depth > 3:
+            return (self.layers[0].im[3:], self.layers[0].sk[1:]), self.layers[-1]
+        else:
+            return self.layers[-1]
+    def get_output_layer(self):
+        return self.layers[-1][-1]
+    def set_output_layer(self, x):
+        self.layers[-1][-1] = x
+def inception1d(**kwargs):
+    """
+    Constructs an Inception model
+    """
+    return Inception1d(**kwargs)

models/resnet1d.py ADDED Viewed

	@@ -0,0 +1,299 @@

+import torch.nn as nn
+import torch.nn.functional as F
+# Standard resnet
+from models.basicconv1d import create_head1d
+def conv(in_planes, out_planes, stride=1, kernel_size=3):
+    """convolution with padding"""
+    return nn.Conv1d(in_planes, out_planes, kernel_size=kernel_size, stride=stride,
+                     padding=(kernel_size - 1) // 2, bias=False)
+class BasicBlock1d(nn.Module):
+    expansion = 1
+    def __init__(self, inplanes, planes, stride=1, kernel_size=None, down_sample=None):
+        if kernel_size is None:
+            kernel_size = [3, 3]
+        super().__init__()
+        if isinstance(kernel_size, int): kernel_size = [kernel_size, kernel_size // 2 + 1]
+        self.conv1 = conv(inplanes, planes, stride=stride, kernel_size=kernel_size[0])
+        self.bn1 = nn.BatchNorm1d(planes)
+        self.relu = nn.ReLU(inplace=True)
+        self.conv2 = conv(planes, planes, kernel_size=kernel_size[1])
+        self.bn2 = nn.BatchNorm1d(planes)
+        self.down_sample = down_sample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        if self.down_sample is not None:
+            residual = self.down_sample(x)
+        out += residual
+        out = self.relu(out)
+        return out
+class Bottleneck1d(nn.Module):
+    expansion = 4
+    def __init__(self, inplanes, planes, stride=1, kernel_size=3, down_sample=None):
+        super().__init__()
+        self.conv1 = nn.Conv1d(inplanes, planes, kernel_size=1, bias=False)
+        self.bn1 = nn.BatchNorm1d(planes)
+        self.conv2 = nn.Conv1d(planes, planes, kernel_size=kernel_size, stride=stride,
+                               padding=(kernel_size - 1) // 2, bias=False)
+        self.bn2 = nn.BatchNorm1d(planes)
+        self.conv3 = nn.Conv1d(planes, planes * 4, kernel_size=1, bias=False)
+        self.bn3 = nn.BatchNorm1d(planes * 4)
+        self.relu = nn.ReLU(inplace=True)
+        self.down_sample = down_sample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+        out = self.conv3(out)
+        out = self.bn3(out)
+        if self.down_sample is not None:
+            residual = self.down_sample(x)
+        out += residual
+        out = self.relu(out)
+        return out
+class ResNet1d(nn.Sequential):
+    """1d adaptation of the torchvision resnet"""
+    def __init__(self, block, layers, kernel_size=3, num_classes=2, input_channels=3, inplanes=64, fix_feature_dim=True,
+                 kernel_size_stem=None, stride_stem=2, pooling_stem=True, stride=2, lin_ftrs_head=None, ps_head=0.5,
+                 bn_final_head=False, bn_head=True, act_head="relu", concat_pooling=True):
+        self.inplanes = inplanes
+        layers_tmp = []
+        if kernel_size_stem is None:
+            kernel_size_stem = kernel_size[0] if isinstance(kernel_size, list) else kernel_size
+        # stem
+        layers_tmp.append(nn.Conv1d(input_channels, inplanes, kernel_size=kernel_size_stem, stride=stride_stem,
+                                    padding=(kernel_size_stem - 1) // 2, bias=False))
+        layers_tmp.append(nn.BatchNorm1d(inplanes))
+        layers_tmp.append(nn.ReLU(inplace=True))
+        if pooling_stem is True:
+            layers_tmp.append(nn.MaxPool1d(kernel_size=3, stride=2, padding=1))
+        # backbone
+        for i, l in enumerate(layers):
+            if i == 0:
+                layers_tmp.append(self._make_layer(block, inplanes, layers[0], kernel_size=kernel_size))
+            else:
+                layers_tmp.append(
+                    self._make_layer(block, inplanes if fix_feature_dim else (2 ** i) * inplanes, layers[i],
+                                     stride=stride, kernel_size=kernel_size))
+        # head
+        # layers_tmp.append(nn.AdaptiveAvgPool1d(1))
+        # layers_tmp.append(Flatten())
+        # layers_tmp.append(nn.Linear((inplanes if fix_feature_dim else (2**len(layers)*inplanes)) * block.expansion, num_classes))
+        head = create_head1d(
+            (inplanes if fix_feature_dim else (2 ** len(layers) * inplanes)) * block.expansion, nc=num_classes,
+            lin_ftrs=lin_ftrs_head, ps=ps_head, bn_final=bn_final_head, bn=bn_head, act=act_head,
+            concat_pooling=concat_pooling)
+        layers_tmp.append(head)
+        super().__init__()
+    def _make_layer(self, block, planes, blocks, stride=1, kernel_size=3):
+        down_sample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            down_sample = nn.Sequential(
+                nn.Conv1d(self.inplanes, planes * block.expansion,
+                          kernel_size=1, stride=stride, bias=False),
+                nn.BatchNorm1d(planes * block.expansion),
+            )
+        layers = [block(self.inplanes, planes, stride, kernel_size, down_sample)]
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(self.inplanes, planes))
+        return nn.Sequential(*layers)
+    def get_layer_groups(self):
+        return self[6], self[-1]
+    def get_output_layer(self):
+        return self[-1][-1]
+    def set_output_layer(self, x):
+        self[-1][-1] = x
+def resnet1d18(**kwargs):
+    """
+    Constructs a ResNet-18 model.
+    """
+    return ResNet1d(BasicBlock1d, [2, 2, 2, 2], **kwargs)
+def resnet1d34(**kwargs):
+    """
+    Constructs a ResNet-34 model.
+    """
+    return ResNet1d(BasicBlock1d, [3, 4, 6, 3], **kwargs)
+def resnet1d50(**kwargs):
+    """
+    Constructs a ResNet-50 model.
+    """
+    return ResNet1d(Bottleneck1d, [3, 4, 6, 3], **kwargs)
+def resnet1d101(**kwargs):
+    """
+    Constructs a ResNet-101 model.
+    """
+    return ResNet1d(Bottleneck1d, [3, 4, 23, 3], **kwargs)
+def resnet1d152(**kwargs):
+    """
+    Constructs a ResNet-152 model.
+    """
+    return ResNet1d(Bottleneck1d, [3, 8, 36, 3], **kwargs)
+# original used kernel_size_stem = 8
+def resnet1d_wang(**kwargs):
+    if not ("kernel_size" in kwargs.keys()):
+        kwargs["kernel_size"] = [5, 3]
+    if not ("kernel_size_stem" in kwargs.keys()):
+        kwargs["kernel_size_stem"] = 7
+    if not ("stride_stem" in kwargs.keys()):
+        kwargs["stride_stem"] = 1
+    if not ("pooling_stem" in kwargs.keys()):
+        kwargs["pooling_stem"] = False
+    if not ("inplanes" in kwargs.keys()):
+        kwargs["inplanes"] = 128
+    return ResNet1d(BasicBlock1d, [1, 1, 1], **kwargs)
+def resnet1d(**kwargs):
+    """
+    Constructs a custom ResNet model.
+    """
+    return ResNet1d(BasicBlock1d, **kwargs)
+# wide resnet adopted from fastai wrn
+def noop(x): return x
+def conv1d(ni: int, nf: int, ks: int = 3, stride: int = 1, padding: int = None, bias=False) -> nn.Conv1d:
+    "Create `nn.Conv1d` layer: `ni` inputs, `nf` outputs, `ks` kernel size. `padding` defaults to `k//2`."
+    if padding is None: padding = ks // 2
+    return nn.Conv1d(ni, nf, kernel_size=ks, stride=stride, padding=padding, bias=bias)
+def _bn1d(ni, init_zero=False):
+    "Batchnorm layer with 0 initialization"
+    m = nn.BatchNorm1d(ni)
+    m.weight.data.fill_(0 if init_zero else 1)
+    m.bias.data.zero_()
+    return m
+def bn_relu_conv1d(ni, nf, ks, stride, init_zero=False):
+    bn_initzero = _bn1d(ni, init_zero=init_zero)
+    return nn.Sequential(bn_initzero, nn.ReLU(inplace=True), conv1d(ni, nf, ks, stride))
+class BasicBlock1dwrn(nn.Module):
+    def __init__(self, ni, nf, stride, drop_p=0.0, ks=3):
+        super().__init__()
+        if isinstance(ks, int):
+            ks = [ks, ks // 2 + 1]
+        self.bn = nn.BatchNorm1d(ni)
+        self.conv1 = conv1d(ni, nf, ks[0], stride)
+        self.conv2 = bn_relu_conv1d(nf, nf, ks[0], 1)
+        self.drop = nn.Dropout(drop_p, inplace=True) if drop_p else None
+        self.shortcut = conv1d(ni, nf, ks[1], stride) if (
+                ni != nf or stride > 1) else noop  # adapted to make it work for fix_feature_dim=True
+    def forward(self, x):
+        x2 = F.relu(self.bn(x), inplace=True)
+        r = self.shortcut(x2)
+        x = self.conv1(x2)
+        if self.drop: x = self.drop(x)
+        x = self.conv2(x) * 0.2
+        return x.add_(r)
+def _make_group(N, ni, nf, block, stride, drop_p, ks=3):
+    return [block(ni if i == 0 else nf, nf, stride if i == 0 else 1, drop_p, ks=ks) for i in range(N)]
+class WideResNet1d(nn.Sequential):
+    def __init__(self, input_channels: int, num_groups: int, N: int, num_classes: int, k: int = 1, drop_p: float = 0.0,
+                 start_nf: int = 16, fix_feature_dim=True, kernel_size=5, lin_ftrs_head=None, ps_head=0.5,
+                 bn_final_head=False, bn_head=True, act_head="relu", concat_pooling=True):
+        super().__init__()
+        n_channels = [start_nf]
+        for i in range(num_groups): n_channels.append(start_nf if fix_feature_dim else start_nf * (2 ** i) * k)
+        layers = [conv1d(input_channels, n_channels[0], 3, 1)]  # conv1 stem
+        for i in range(num_groups):
+            layers += _make_group(N, n_channels[i], n_channels[i + 1], BasicBlock1dwrn,
+                                  (1 if i == 0 else 2), drop_p, ks=kernel_size)
+        # layers += [nn.BatchNorm1d(n_channels[-1]), nn.ReLU(inplace=True), nn.AdaptiveAvgPool1d(1),
+        #           Flatten(), nn.Linear(n_channels[-1], num_classes)]
+        head = create_head1d(n_channels[-1], nc=num_classes, lin_ftrs=lin_ftrs_head, ps=ps_head,
+                             bn_final=bn_final_head, bn=bn_head, act=act_head,
+                             concat_pooling=concat_pooling)
+        layers.append(head)
+        super().__init__()
+    def get_layer_groups(self):
+        return self[6], self[-1]
+    def get_output_layer(self):
+        return self[-1][-1]
+    def set_output_layer(self, x):
+        self[-1][-1] = x
+def wrn1d_22(**kwargs): return WideResNet1d(num_groups=3, N=3, k=6, drop_p=0., **kwargs)

models/rnn1d.py ADDED Viewed

	@@ -0,0 +1,67 @@

+import torch
+import torch.nn as nn
+from fastai.layers import *
+from fastai.data.core import *
+class AdaptiveConcatPoolRNN(nn.Module):
+    def __init__(self, bidirectional):
+        super().__init__()
+        self.bidirectional = bidirectional
+    def forward(self, x):
+        # input shape bs, ch, ts
+        t1 = nn.AdaptiveAvgPool1d(1)(x)
+        t2 = nn.AdaptiveMaxPool1d(1)(x)
+        if self.bidirectional is False:
+            t3 = x[:, :, -1]
+        else:
+            channels = x.size()[1]
+            t3 = torch.cat([x[:, :channels, -1], x[:, channels:, 0]], 1)
+        out = torch.cat([t1.squeeze(-1), t2.squeeze(-1), t3], 1)  # output shape bs, 3*ch
+        return out
+class RNN1d(nn.Sequential):
+    def __init__(self, input_channels, num_classes, lstm=True, hidden_dim=256, num_layers=2, bidirectional=False,
+                 ps_head=0.5, act_head="relu", lin_ftrs_head=None, bn=True):
+        # bs, ch, ts -> ts, bs, ch
+        layers_tmp = [Lambda(lambda x: x.transpose(1, 2)), Lambda(lambda x: x.transpose(0, 1))]
+        # LSTM
+        if lstm:
+            layers_tmp.append(nn.LSTM(input_size=input_channels, hidden_size=hidden_dim, num_layers=num_layers,
+                                      bidirectional=bidirectional))
+        else:
+            layers_tmp.append(nn.GRU(input_size=input_channels, hidden_size=hidden_dim, num_layers=num_layers,
+                                     bidirectional=bidirectional))
+        # pooling
+        layers_tmp.append(Lambda(lambda x: x[0].transpose(0, 1)))
+        layers_tmp.append(Lambda(lambda x: x.transpose(1, 2)))
+        layers_head = [AdaptiveConcatPoolRNN(bidirectional)]
+        # classifier
+        nf = 3 * hidden_dim if bidirectional is False else 6 * hidden_dim
+        lin_ftrs_head = [nf, num_classes] if lin_ftrs_head is None else [nf] + lin_ftrs_head + [num_classes]
+        ps_head = listify(ps_head)
+        if len(ps_head) == 1:
+            ps_head = [ps_head[0] / 2] * (len(lin_ftrs_head) - 2) + ps_head
+        actns = [nn.ReLU(inplace=True) if act_head == "relu" else nn.ELU(inplace=True)] * (
+                len(lin_ftrs_head) - 2) + [None]
+        for ni, no, p, actn in zip(lin_ftrs_head[:-1], lin_ftrs_head[1:], ps_head, actns):
+            layers_head += bn_drop_lin(ni, no, bn, p, actn)
+        layers_head = nn.Sequential(*layers_head)
+        layers_tmp.append(layers_head)
+        super().__init__()
+    def get_layer_groups(self):
+        return self[-1],
+    def get_output_layer(self):
+        return self[-1][-1]
+    def set_output_layer(self, x):
+        self[-1][-1] = x

models/wavelet.py ADDED Viewed

	@@ -0,0 +1,158 @@

+from sklearn.linear_model import LogisticRegression
+from sklearn.multiclass import OneVsRestClassifier
+from models.base_model import ClassificationModel
+import pickle
+from tqdm import tqdm
+import numpy as np
+from sklearn.ensemble import RandomForestClassifier
+import pywt
+import scipy.stats
+import multiprocessing
+from collections import Counter
+from keras.layers import Dropout, Dense, Input
+from keras.models import Model
+from keras.models import load_model
+from keras.callbacks import ModelCheckpoint
+from sklearn.preprocessing import StandardScaler
+def calculate_entropy(list_values):
+    counter_values = Counter(list_values).most_common()
+    probabilities = [elem[1] / len(list_values) for elem in counter_values]
+    entropy = scipy.stats.entropy(probabilities)
+    return entropy
+def calculate_statistics(list_values):
+    n5 = np.nanpercentile(list_values, 5)
+    n25 = np.nanpercentile(list_values, 25)
+    n75 = np.nanpercentile(list_values, 75)
+    n95 = np.nanpercentile(list_values, 95)
+    median = np.nanpercentile(list_values, 50)
+    mean = np.nanmean(list_values)
+    std = np.nanstd(list_values)
+    var = np.nanvar(list_values)
+    rms = np.nanmean(np.sqrt(list_values ** 2))
+    return [n5, n25, n75, n95, median, mean, std, var, rms]
+def calculate_crossings(list_values):
+    zero_crossing_indices = np.nonzero(np.diff(np.array(list_values) > 0))[0]
+    no_zero_crossings = len(zero_crossing_indices)
+    mean_crossing_indices = np.nonzero(np.diff(np.array(list_values) > np.nanmean(list_values)))[0]
+    no_mean_crossings = len(mean_crossing_indices)
+    return [no_zero_crossings, no_mean_crossings]
+def get_features(list_values):
+    entropy = calculate_entropy(list_values)
+    crossings = calculate_crossings(list_values)
+    statistics = calculate_statistics(list_values)
+    return [entropy] + crossings + statistics
+def get_single_ecg_features(signal, waveletname='db6'):
+    features = []
+    for channel in signal.T:
+        list_coeff = pywt.wavedec(channel, wavelet=waveletname, level=5)
+        channel_features = []
+        for coeff in list_coeff:
+            channel_features += get_features(coeff)
+        features.append(channel_features)
+    return np.array(features).flatten()
+def get_ecg_features(ecg_data, parallel=True):
+    if parallel:
+        pool = multiprocessing.Pool(18)
+        return np.array(pool.map(get_single_ecg_features, ecg_data))
+    else:
+        list_features = []
+        for signal in tqdm(ecg_data):
+            features = get_single_ecg_features(signal)
+            list_features.append(features)
+        return np.array(list_features)
+# for keras models
+# def keras_macro_auroc(y_true, y_pred):
+#    return tf.py_func(macro_auroc, (y_true, y_pred), tf.double)
+class WaveletModel(ClassificationModel):
+    def __init__(self, name, n_classes, freq, outputfolder, input_shape, regularizer_C=.001, classifier='RF'):
+        # Disclaimer: This model assumes equal shapes across all samples!
+        # standard parameters
+        super().__init__()
+        self.name = name
+        self.outputfolder = outputfolder
+        self.n_classes = n_classes
+        self.freq = freq
+        self.regularizer_C = regularizer_C
+        self.classifier = classifier
+        self.dropout = .25
+        self.activation = 'relu'
+        self.final_activation = 'sigmoid'
+        self.n_dense_dim = 128
+        self.epochs = 30
+    def fit(self, X_train, y_train, X_val, y_val):
+        XF_train = get_ecg_features(X_train)
+        XF_val = get_ecg_features(X_val)
+        if self.classifier == 'LR':
+            if self.n_classes > 1:
+                clf = OneVsRestClassifier(
+                    LogisticRegression(C=self.regularizer_C, solver='lbfgs', max_iter=1000, n_jobs=-1))
+            else:
+                clf = LogisticRegression(C=self.regularizer_C, solver='lbfgs', max_iter=1000, n_jobs=-1)
+            clf.fit(XF_train, y_train)
+            pickle.dump(clf, open(self.outputfolder + 'clf.pkl', 'wb'))
+        elif self.classifier == 'RF':
+            clf = RandomForestClassifier(n_estimators=1000, n_jobs=16)
+            clf.fit(XF_train, y_train)
+            pickle.dump(clf, open(self.outputfolder + 'clf.pkl', 'wb'))
+        elif self.classifier == 'NN':
+            # standardize input data
+            ss = StandardScaler()
+            XFT_train = ss.fit_transform(XF_train)
+            XFT_val = ss.transform(XF_val)
+            pickle.dump(ss, open(self.outputfolder + 'ss.pkl', 'wb'))
+            # classification stage
+            input_x = Input(shape=(XFT_train.shape[1],))
+            x = Dense(self.n_dense_dim, activation=self.activation)(input_x)
+            x = Dropout(self.dropout)(x)
+            y = Dense(self.n_classes, activation=self.final_activation)(x)
+            self.model = Model(input_x, y)
+            self.model.compile(optimizer='adamax', loss='binary_crossentropy')  # , metrics=[keras_macro_auroc])
+            # monitor validation error
+            mc_loss = ModelCheckpoint(self.outputfolder + 'best_loss_model.h5', monitor='val_loss', mode='min',
+                                      verbose=1, save_best_only=True)
+            # mc_score = ModelCheckpoint(self.output_folder +'best_score_model.h5', monitor='val_keras_macro_auroc', mode='max', verbose=1, save_best_only=True)
+            self.model.fit(XFT_train, y_train, validation_data=(XFT_val, y_val), epochs=self.epochs, batch_size=128,
+                           callbacks=[mc_loss])  # , mc_score])
+            self.model.save(self.outputfolder + 'last_model.h5')
+    def predict(self, X):
+        XF = get_ecg_features(X)
+        if self.classifier == 'LR':
+            clf = pickle.load(open(self.outputfolder + 'clf.pkl', 'rb'))
+            if self.n_classes > 1:
+                return clf.predict_proba(XF)
+            else:
+                return clf.predict_proba(XF)[:, 1][:, np.newaxis]
+        elif self.classifier == 'RF':
+            clf = pickle.load(open(self.outputfolder + 'clf.pkl', 'rb'))
+            y_pred = clf.predict_proba(XF)
+            if self.n_classes > 1:
+                return np.array([yi[:, 1] for yi in y_pred]).T
+            else:
+                return y_pred[:, 1][:, np.newaxis]
+        elif self.classifier == 'NN':
+            ss = pickle.load(open(self.outputfolder + 'ss.pkl', 'rb'))  #
+            XFT = ss.transform(XF)
+            model = load_model(
+                self.outputfolder + 'best_loss_model.h5')
+            # 'best_score_model.h5', custom_objects={
+            # 'keras_macro_auroc': keras_macro_auroc})
+            return model.predict(XFT)

models/xresnet1d.py ADDED Viewed

	@@ -0,0 +1,239 @@

+import torch
+import torch.nn as nn
+from enum import Enum
+import re
+# delegates
+import inspect
+from torch.nn.utils import weight_norm, spectral_norm
+from models.basicconv1d import create_head1d
+def delegates(to=None, keep=False):
+    """Decorator: replace `**kwargs` in signature with params from `to`"""
+    def _f(f):
+        if to is None:
+            to_f, from_f = f.__base__.__init__, f.__init__
+        else:
+            to_f, from_f = to, f
+        sig = inspect.signature(from_f)
+        sigd = dict(sig.parameters)
+        k = sigd.pop('kwargs')
+        s2 = {k: v for k, v in inspect.signature(to_f).parameters.items()
+              if v.default != inspect.Parameter.empty and k not in sigd}
+        sigd.update(s2)
+        if keep: sigd['kwargs'] = k
+        from_f.__signature__ = sig.replace(parameters=sigd.values())
+        return f
+    return _f
+def store_attr(self, nms):
+    """Store params named in comma-separated `nms` from calling context into attrs in `self`"""
+    mod = inspect.currentframe().f_back.f_locals
+    for n in re.split(', *', nms): setattr(self, n, mod[n])
+NormType = Enum('NormType', 'Batch BatchZero Weight Spectral Instance InstanceZero')
+def _conv_func(ndim=2, transpose=False):
+    """Return the proper conv `ndim` function, potentially `transposed`."""
+    assert 1 <= ndim <= 3
+    return getattr(nn, f'Conv{"Transpose" if transpose else ""}{ndim}d')
+def init_default(m, func=nn.init.kaiming_normal_):
+    """Initialize `m` weights with `func` and set `bias` to 0."""
+    if func and hasattr(m, 'weight'): func(m.weight)
+    with torch.no_grad():
+        if getattr(m, 'bias', None) is not None: m.bias.fill_(0.)
+    return m
+def _get_norm(prefix, nf, ndim=2, zero=False, **kwargs):
+    """Norm layer with `nf` features and `ndim` initialized depending on `norm_type`."""
+    assert 1 <= ndim <= 3
+    bn = getattr(nn, f"{prefix}{ndim}d")(nf, **kwargs)
+    if bn.affine:
+        bn.bias.data.fill_(1e-3)
+        bn.weight.data.fill_(0. if zero else 1.)
+    return bn
+def BatchNorm(nf, ndim=2, norm_type=NormType.Batch, **kwargs):
+    """BatchNorm layer with `nf` features and `ndim` initialized depending on `norm_type`."""
+    return _get_norm('BatchNorm', nf, ndim, zero=norm_type == NormType.BatchZero, **kwargs)
+class ConvLayer(nn.Sequential):
+    """Create a sequence of convolutional (`ni` to `nf`), ReLU (if `use_activ`) and `norm_type` layers."""
+    def __init__(self, ni, nf, ks=3, stride=1, padding=None, bias=None, ndim=2, norm_type=NormType.Batch, bn_1st=True,
+                 act_cls=nn.ReLU, transpose=False, init=nn.init.kaiming_normal_, xtra=None, **kwargs):
+        if padding is None: padding = ((ks - 1) // 2 if not transpose else 0)
+        bn = norm_type in (NormType.Batch, NormType.BatchZero)
+        inn = norm_type in (NormType.Instance, NormType.InstanceZero)
+        if bias is None: bias = not (bn or inn)
+        conv_func = _conv_func(ndim, transpose=transpose)
+        conv = init_default(conv_func(ni, nf, kernel_size=ks, bias=bias, stride=stride, padding=padding, **kwargs),
+                            init)
+        if norm_type == NormType.Weight:
+            conv = weight_norm(conv)
+        elif norm_type == NormType.Spectral:
+            conv = spectral_norm(conv)
+        layers = [conv]
+        act_bn = []
+        if act_cls is not None: act_bn.append(act_cls())
+        if bn: act_bn.append(BatchNorm(nf, norm_type=norm_type, ndim=ndim))
+        if inn: act_bn.append(nn.InstanceNorm2d(nf, norm_type=norm_type, ndim=ndim))
+        if bn_1st: act_bn.reverse()
+        layers += act_bn
+        if xtra: layers.append(xtra)
+        super().__init__()
+def AdaptiveAvgPool(sz=1, ndim=2):
+    """nn.AdaptiveAvgPool layer for `ndim`"""
+    assert 1 <= ndim <= 3
+    return getattr(nn, f"AdaptiveAvgPool{ndim}d")(sz)
+def MaxPool(ks=2, stride=None, padding=0, ndim=2, ceil_mode=False):
+    """nn.MaxPool layer for `ndim`"""
+    assert 1 <= ndim <= 3
+    return getattr(nn, f"MaxPool{ndim}d")(ks, stride=stride, padding=padding)
+def AvgPool(ks=2, stride=None, padding=0, ndim=2, ceil_mode=False):
+    """nn.AvgPool layer for `ndim`"""
+    assert 1 <= ndim <= 3
+    return getattr(nn, f"AvgPool{ndim}d")(ks, stride=stride, padding=padding, ceil_mode=ceil_mode)
+class ResBlock(nn.Module):
+    "Resnet block from `ni` to `nh` with `stride`"
+    @delegates(ConvLayer.__init__)
+    def __init__(self, expansion, ni, nf, stride=1, kernel_size=3, groups=1, reduction=None, nh1=None, nh2=None,
+                 dw=False, g2=1,
+                 sa=False, sym=False, norm_type=NormType.Batch, act_cls=nn.ReLU, ndim=2,
+                 pool=AvgPool, pool_first=True, **kwargs):
+        super().__init__()
+        norm2 = (NormType.BatchZero if norm_type == NormType.Batch else
+                 NormType.InstanceZero if norm_type == NormType.Instance else norm_type)
+        if nh2 is None: nh2 = nf
+        if nh1 is None: nh1 = nh2
+        nf, ni = nf * expansion, ni * expansion
+        k0 = dict(norm_type=norm_type, act_cls=act_cls, ndim=ndim, **kwargs)
+        k1 = dict(norm_type=norm2, act_cls=None, ndim=ndim, **kwargs)
+        layers = [ConvLayer(ni, nh2, kernel_size, stride=stride, groups=ni if dw else groups, **k0),
+                  ConvLayer(nh2, nf, kernel_size, groups=g2, **k1)
+                  ] if expansion == 1 else [
+            ConvLayer(ni, nh1, 1, **k0),
+            ConvLayer(nh1, nh2, kernel_size, stride=stride, groups=nh1 if dw else groups, **k0),
+            ConvLayer(nh2, nf, 1, groups=g2, **k1)]
+        self.convs = nn.Sequential(*layers)
+        convpath = [self.convs]
+        if reduction: convpath.append(nn.SEModule(nf, reduction=reduction, act_cls=act_cls))
+        if sa: convpath.append(nn.SimpleSelfAttention(nf, ks=1, sym=sym))
+        self.convpath = nn.Sequential(*convpath)
+        idpath = []
+        if ni != nf: idpath.append(ConvLayer(ni, nf, 1, act_cls=None, ndim=ndim, **kwargs))
+        if stride != 1: idpath.insert((1, 0)[pool_first], pool(2, ndim=ndim, ceil_mode=True))
+        self.idpath = nn.Sequential(*idpath)
+        self.act = nn.ReLU(inplace=True) if act_cls is nn.ReLU else act_cls()
+    def forward(self, x):
+        return self.act(self.convpath(x) + self.idpath(x))
+######################### adapted from vison.models.xresnet
+def init_cnn(m):
+    if getattr(m, 'bias', None) is not None: nn.init.constant_(m.bias, 0)
+    if isinstance(m, (nn.Conv1d, nn.Conv2d, nn.Linear)): nn.init.kaiming_normal_(m.weight)
+    for l in m.children(): init_cnn(l)
+class XResNet1d(nn.Sequential):
+    @delegates(ResBlock)
+    def __init__(self, block, expansion, layers, p=0.0, input_channels=3, num_classes=1000, stem_szs=(32, 32, 64),
+                 kernel_size=5, kernel_size_stem=5,
+                 widen=1.0, sa=False, act_cls=nn.ReLU, lin_ftrs_head=None, ps_head=0.5, bn_final_head=False,
+                 bn_head=True, act_head="relu", concat_pooling=True, **kwargs):
+        store_attr(self, 'block,expansion,act_cls')
+        stem_szs = [input_channels, *stem_szs]
+        stem = [ConvLayer(stem_szs[i], stem_szs[i + 1], ks=kernel_size_stem, stride=2 if i == 0 else 1, act_cls=act_cls,
+                          ndim=1)
+                for i in range(3)]
+        # block_szs = [int(o*widen) for o in [64,128,256,512] +[256]*(len(layers)-4)]
+        block_szs = [int(o * widen) for o in [64, 64, 64, 64] + [32] * (len(layers) - 4)]
+        block_szs = [64 // expansion] + block_szs
+        blocks = [self._make_layer(ni=block_szs[i], nf=block_szs[i + 1], blocks=l,
+                                   stride=1 if i == 0 else 2, kernel_size=kernel_size, sa=sa and i == len(layers) - 4,
+                                   ndim=1, **kwargs)
+                  for i, l in enumerate(layers)]
+        head = create_head1d(block_szs[-1] * expansion, nc=num_classes, lin_ftrs=lin_ftrs_head, ps=ps_head,
+                             bn_final=bn_final_head, bn=bn_head, act=act_head,
+                             concat_pooling=concat_pooling)
+        super().__init__(nn.MaxPool1d(kernel_size=3, stride=2, padding=1), head)
+        init_cnn(self)
+    def _make_layer(self, ni, nf, blocks, stride, kernel_size, sa, **kwargs):
+        return nn.Sequential(
+            *[self.block(self.expansion, ni if i == 0 else nf, nf, stride=stride if i == 0 else 1,
+                         kernel_size=kernel_size, sa=sa and i == (blocks - 1), act_cls=self.act_cls, **kwargs)
+              for i in range(blocks)])
+    def get_layer_groups(self):
+        return self[3], self[-1]
+    def get_output_layer(self):
+        return self[-1][-1]
+    def set_output_layer(self, x):
+        self[-1][-1] = x
+# xresnets
+def _xresnet1d(expansion, layers, **kwargs):
+    return XResNet1d(ResBlock, expansion, layers, **kwargs)
+def xresnet1d18(**kwargs): return _xresnet1d(1, [2, 2, 2, 2], **kwargs)
+def xresnet1d34(**kwargs): return _xresnet1d(1, [3, 4, 6, 3], **kwargs)
+def xresnet1d50(**kwargs): return _xresnet1d(4, [3, 4, 6, 3], **kwargs)
+def xresnet1d101(**kwargs): return _xresnet1d(4, [3, 4, 23, 3], **kwargs)
+def xresnet1d152(**kwargs): return _xresnet1d(4, [3, 8, 36, 3], **kwargs)
+def xresnet1d18_deep(**kwargs): return _xresnet1d(1, [2, 2, 2, 2, 1, 1], **kwargs)
+def xresnet1d34_deep(**kwargs): return _xresnet1d(1, [3, 4, 6, 3, 1, 1], **kwargs)
+def xresnet1d50_deep(**kwargs): return _xresnet1d(4, [3, 4, 6, 3, 1, 1], **kwargs)
+def xresnet1d18_deeper(**kwargs): return _xresnet1d(1, [2, 2, 1, 1, 1, 1, 1, 1], **kwargs)
+def xresnet1d34_deeper(**kwargs): return _xresnet1d(1, [3, 4, 6, 3, 1, 1, 1, 1], **kwargs)
+def xresnet1d50_deeper(**kwargs): return _xresnet1d(4, [3, 4, 6, 3, 1, 1, 1, 1], **kwargs)

requirements.txt ADDED Viewed

	@@ -0,0 +1,23 @@

+pytorch
+fastai
+blis==0.7.5
+cycler==0.11.0
+cymem==2.0.6
+fastcore==1.3.27
+jinja2==3.0.3
+kiwisolver==1.3.2
+markupsafe==2.0.1
+murmurhash==1.0.6
+pathy==0.6.1
+pillow==8.4.0
+preshed==3.0.6
+pyparsing==3.0.5
+smart-open==5.2.1
+srsly==2.4.2
+thinc==8.0.13
+torchvision==0.11.1
+wfdb==3.4.1
+wget==3.2
+scikit-image
+pyWavelets
+kereas

utilities/__pycache__/timeseries_utils.cpython-39.pyc ADDED Viewed

Binary file (21 kB). View file

utilities/__pycache__/utils.cpython-39.pyc ADDED Viewed

Binary file (16.7 kB). View file

utilities/stratify.py ADDED Viewed

	@@ -0,0 +1,173 @@

+import numpy as np
+from tqdm import tqdm
+def stratify_df(df, new_col_name, n_folds=10, nr_clean_folds=0):
+    # compute qualities as described in PTB-XL report
+    qualities = []
+    for i, row in df.iterrows():
+        q = 0
+        if 'validated_by_human' in df.columns:
+            if row.validated_by_human:
+                q = 1
+        qualities.append(q)
+    df['quality'] = qualities
+    # create stratified folds according to patients
+    pat_ids = np.array(sorted(list(set(df.patient_id.values))))
+    p_labels = []
+    p_qualities = []
+    ecgs_per_patient = []
+    for pid in tqdm(pat_ids):
+        sel = df[df.patient_id == pid]
+        l = np.concatenate([list(d.keys()) for d in sel.scp_codes.values])
+        if sel.sex.values[0] == 0:
+            gender = 'male'
+        else:
+            gender = 'female'
+        l = np.concatenate((l, [gender] * len(sel)))
+        for age in sel.age.values:
+            if age < 20:
+                l = np.concatenate((l, ['<20']))
+            elif 20 <= age < 40:
+                l = np.concatenate((l, ['20-40']))
+            elif 40 <= age < 60:
+                l = np.concatenate((l, ['40-60']))
+            elif 60 <= age < 80:
+                l = np.concatenate((l, ['60-80']))
+            elif age >= 80:
+                l = np.concatenate((l, ['>=80']))
+        p_labels.append(l)
+        ecgs_per_patient.append(len(sel))
+        p_qualities.append(sel.quality.min())
+    classes = sorted(list(set([item for sublist in p_labels for item in sublist])))
+    stratified_data_ids, stratified_data = stratify(p_labels, classes, [1 / n_folds] * n_folds, p_qualities,
+                                                    ecgs_per_patient, nr_clean_folds)
+    df[new_col_name] = np.zeros(len(df)).astype(int)
+    for fold_i, fold_ids in tqdm(enumerate(stratified_data_ids)):
+        ipat_ids = [pat_ids[pid] for pid in fold_ids]
+        df[new_col_name][df.patient_id.isin(ipat_ids)] = fold_i + 1
+    return df
+def stratify(data, classes, ratios, qualities, ecgs_per_patient, nr_clean_folds=1):
+    """Stratifying procedure. Modified from https://vict0rs.ch/2018/05/24/sample-multilabel-dataset/ (based on Sechidis 2011)
+    data is a list of lists: a list of labels, for each sample.
+        Each sample's labels should be ints, if they are one-hot encoded, use one_hot=True
+    classes is the list of classes each label can take
+    ratios is a list, summing to 1, of how the dataset should be split
+    qualities: quality per entry (only >0 can be assigned to clean folds; 4 will always be assigned to final fold)
+    ecgs_per_patient: list with number of ecgs per sample
+    nr_clean_folds: the last nr_clean_folds can only take clean entries
+    """
+    np.random.seed(0)  # fix the random seed
+    # data is now always a list of lists; len(data) is the number of patients; data[i] is the list of all labels for
+    # patient i (possibly multiple identical entries)
+    # size is the number of ecgs
+    size = np.sum(ecgs_per_patient)
+    # Organize data per label: for each label l, per_label_data[l] contains the list of patients
+    # in data which have this label (potentially multiple identical entries)
+    per_label_data = {c: [] for c in classes}
+    for i, d in enumerate(data):
+        for l in d:
+            per_label_data[l].append(i)
+    # In order not to compute lengths each time, they are tracked here.
+    subset_sizes = [r * size for r in ratios]  # list of subset_sizes in terms of ecgs
+    per_label_subset_sizes = {c: [r * len(per_label_data[c]) for r in ratios] for c in
+                              classes}  # dictionary with label: list of subset sizes in terms of patients
+    # For each subset we want, the set of sample-ids which should end up in it
+    stratified_data_ids = [set() for _ in range(len(ratios))]  # initialize empty
+    # For each sample in the data set
+    print("Assigning patients to folds...")
+    size_prev = size + 1  # just for output
+    while size > 0:
+        if int(size_prev / 1000) > int(size / 1000):
+            print("Remaining patients/ecgs to distribute:", size, "non-empty labels:",
+                  np.sum([1 for l, label_data in per_label_data.items() if len(label_data) > 0]))
+        size_prev = size
+        # Compute |Di|
+        lengths = {
+            l: len(label_data)
+            for l, label_data in per_label_data.items()
+        }  # dictionary label: number of ecgs with this label that have not been assigned to a fold yet
+        try:
+            # Find label of smallest |Di|
+            label = min({k: v for k, v in lengths.items() if v > 0}, key=lengths.get)
+        except ValueError:
+            # If the dictionary in `min` is empty we get a Value Error.
+            # This can happen if there are unlabeled samples.
+            # In this case, `size` would be > 0 but only samples without label would remain.
+            # "No label" could be a class in itself: it's up to you to format your data accordingly.
+            break
+        # For each patient with label `label` get patient and corresponding counts
+        unique_samples, unique_counts = np.unique(per_label_data[label], return_counts=True)
+        idxs_sorted = np.argsort(unique_counts, kind='stable')[::-1]
+        unique_samples = unique_samples[
+            idxs_sorted]  # this is a list of all patient ids with this label sort by size descending
+        unique_counts = unique_counts[idxs_sorted]  # these are the corresponding counts
+        # loop through all patient ids with this label
+        for current_id, current_count in zip(unique_samples, unique_counts):
+            subset_sizes_for_label = per_label_subset_sizes[label]  # current subset sizes for the chosen label
+            # if quality is bad remove clean folds (i.e. sample cannot be assigned to clean folds)
+            if qualities[current_id] < 1:
+                subset_sizes_for_label = subset_sizes_for_label[:len(ratios) - nr_clean_folds]
+            # Find argmax clj i.e. subset in greatest need of the current label
+            largest_subsets = np.argwhere(subset_sizes_for_label == np.amax(subset_sizes_for_label)).flatten()
+            # if there is a single best choice: assign it
+            if len(largest_subsets) == 1:
+                subset = largest_subsets[0]
+            # If there is more than one such subset, find the one in greatest need of any label
+            else:
+                largest_subsets2 = np.argwhere(np.array(subset_sizes)[largest_subsets] == np.amax(
+                    np.array(subset_sizes)[largest_subsets])).flatten()
+                subset = largest_subsets[np.random.choice(largest_subsets2)]
+            # Store the sample's id in the selected subset
+            stratified_data_ids[subset].add(current_id)
+            # There is current_count fewer samples to distribute
+            size -= ecgs_per_patient[current_id]
+            # The selected subset needs current_count fewer samples
+            subset_sizes[subset] -= ecgs_per_patient[current_id]
+            # In the selected subset, there is one more example for each label
+            # the current sample has
+            for l in data[current_id]:
+                per_label_subset_sizes[l][subset] -= 1
+            # Remove the sample from the dataset, meaning from all per_label dataset created
+            for x in per_label_data.keys():
+                per_label_data[x] = [y for y in per_label_data[x] if y != current_id]
+    # Create the stratified dataset as a list of subsets, each containing the original labels
+    stratified_data_ids = [sorted(strat) for strat in stratified_data_ids]
+    stratified_data = [
+        [data[i] for i in strat] for strat in stratified_data_ids
+    ]
+    # Return both the stratified indexes, to be used to sample the `features` associated with your labels
+    # And the stratified labels dataset
+    return stratified_data_ids, stratified_data

utilities/timeseries_utils.py ADDED Viewed

	@@ -0,0 +1,649 @@

+import numpy as np
+import torch
+import torch.utils.data
+from torch import nn
+from pathlib import Path
+from scipy.stats import iqr
+import os
+#Note: due to issues with the numpy rng for multiprocessing
+#(https://github.com/pytorch/pytorch/issues/5059) that could be
+#fixed by a custom worker_init_fn we use random throughout for convenience
+import random
+from skimage import transform
+import warnings
+warnings.filterwarnings("ignore", category=UserWarning)
+from scipy.signal import butter, sosfilt, sosfiltfilt, sosfreqz
+#https://stackoverflow.com/questions/12093594/how-to-implement-band-pass-butterworth-filter-with-scipy-signal-butter
+def butter_filter(lowcut=10, highcut=20, fs=50, order=5, btype='band'):  # image processing Butterworth filter
+    """returns butterworth filter with given specifications"""
+    nyq = 0.5 * fs
+    low = lowcut / nyq
+    high = highcut / nyq
+    sos = butter(order, [low, high] if btype == "band" else (low if btype == "low" else high), analog=False,
+                 btype=btype, output='sos')
+    return sos
+def butter_filter_frequency_response(filter):
+    """returns frequency response of a given filter (result of call of butter_filter)"""
+    w, h = sosfreqz(filter)
+    # gain vs. freq(Hz)
+    # plt.plot((fs * 0.5 / np.pi) * w, abs(h))
+    return w, h
+def apply_butter_filter(data, filter, forwardbackward=True):  # The function provides options for handling the edges of the signal.
+    """pass filter from call of butter_filter to data (assuming time axis at dimension 0)"""
+    if forwardbackward:
+        return sosfiltfilt(filter, data, axis=0)
+    else:
+        data = sosfilt(filter, data, axis=0)
+def dataset_add_chunk_col(df, col="data"):
+    """add a chunk column to the dataset df"""
+    df["chunk"] = df.groupby(col).cumcount()
+def dataset_add_length_col(df, col="data", data_folder=None):
+    """add a length column to the dataset df"""
+    df[col + "_length"] = df[col].apply(lambda x: len(np.load(x if data_folder is None else data_folder / x)))
+def dataset_add_labels_col(df, col="label", data_folder=None):
+    """add a column with unique labels in column col"""
+    df[col + "_labels"] = df[col].apply(
+        lambda x: list(np.unique(np.load(x if data_folder is None else data_folder / x))))
+def dataset_add_mean_col(df, col="data", axis=(0), data_folder=None):
+    """adds a column with mean"""
+    df[col + "_mean"] = df[col].apply(
+        lambda x: np.mean(np.load(x if data_folder is None else data_folder / x), axis=axis))
+def dataset_add_median_col(df, col="data", axis=(0), data_folder=None):
+    """adds a column with median"""
+    df[col + "_median"] = df[col].apply(
+        lambda x: np.median(np.load(x if data_folder is None else data_folder / x), axis=axis))
+def dataset_add_std_col(df, col="data", axis=(0), data_folder=None):
+    """adds a column with mean"""
+    df[col + "_std"] = df[col].apply(
+        lambda x: np.std(np.load(x if data_folder is None else data_folder / x), axis=axis))
+def dataset_add_iqr_col(df, col="data", axis=(0), data_folder=None):
+    """adds a column with mean"""
+    df[col + "_iqr"] = df[col].apply(lambda x: iqr(np.load(x if data_folder is None else data_folder / x), axis=axis))
+def dataset_get_stats(df, col="data", median=False):
+    """creates weighted means and stds from mean, std and length cols of the df"""
+    mean = np.average(np.stack(df[col + ("_median" if median is True else "_mean")], axis=0), axis=0,
+                      weights=np.array(df[col + "_length"]))
+    std = np.average(np.stack(df[col + ("_iqr" if median is True else "_std")], axis=0), axis=0,
+                     weights=np.array(df[col + "_length"]))
+    return mean, std
+def npys_to_memmap(npys, target_filename, delete_npys=False):
+    memmap = None
+    start = []
+    length = []
+    files = []
+    ids = []
+    for idx, npy in enumerate(npys):
+        data = np.load(npy)
+        if memmap is None:
+            memmap = np.memmap(target_filename, dtype=data.dtype, mode='w+', shape=data.shape)
+            start.append(0)
+            length.append(data.shape[0])
+        else:
+            start.append(start[-1] + length[-1])
+            length.append(data.shape[0])
+            memmap = np.memmap(target_filename, dtype=data.dtype, mode='r+',
+                               shape=tuple([start[-1] + length[-1]] + [l for l in data.shape[1:]]))
+        ids.append(idx)
+        memmap[start[-1]:start[-1] + length[-1]] = data[:]
+        memmap.flush()
+        if delete_npys is True:
+            npy.unlink()
+    del memmap
+    np.savez(target_filename.parent / (target_filename.stem + "_meta.npz"), start=start, length=length,
+             shape=[start[-1] + length[-1]] + [l for l in data.shape[1:]], dtype=data.dtype)
+def reformat_as_memmap(df, target_filename, data_folder=None, annotation=False, delete_npys=False):
+    npys_data = []
+    npys_label = []
+    for id, row in df.iterrows():
+        npys_data.append(data_folder / row["data"] if data_folder is not None else row["data"])
+        if annotation:
+            npys_label.append(data_folder / row["label"] if data_folder is not None else row["label"])
+    npys_to_memmap(npys_data, target_filename, delete_npys=delete_npys)
+    if annotation:
+        npys_to_memmap(npys_label, target_filename.parent / (target_filename.stem + "_label.npy"),
+                                        delete_npys=delete_npys)
+    # replace data(filename) by integer
+    df_mapped = df.copy()
+    df_mapped["data_original"] = df_mapped.data
+    df_mapped["data"] = np.arange(len(df_mapped))
+    df_mapped.to_pickle(target_filename.parent / ("df_" + target_filename.stem + ".pkl"))
+    return df_mapped
+# TimeseriesDatasetCrops
+class TimeseriesDatasetCrops(torch.utils.data.Dataset):
+    """timeseries dataset with partial crops."""
+    def __init__(self, df, output_size, chunk_length, min_chunk_length, memmap_filename=None, npy_data=None,
+                 random_crop=True, data_folder=None, num_classes=2, copies=0, col_lbl="label", stride=None, start_idx=0,
+                 annotation=False, transforms=None):
+        """
+        accepts three kinds of input:
+        1) filenames pointing to aligned numpy arrays [timesteps,channels,...] for data and either integer labels or filename pointing to numpy arrays[timesteps,...] e.g. for annotations
+        2) memmap_filename to memmap for data [concatenated,...] and labels- label column in df corresponds to index in this memmap
+        3) npy_data [samples,ts,...] (either path or np.array directly- also supporting variable length input) - label column in df corresponds to sampleid
+        transforms: list of callables (transformations) (applied in the specified order i.e. leftmost element first)
+        """
+        if transforms is None:
+            transforms = []
+        assert not ((memmap_filename is not None) and (npy_data is not None))
+        # require integer entries if using memmap or npy
+        assert (memmap_filename is None and npy_data is None) or df.data.dtype == np.int64
+        self.timeseries_df = df
+        self.output_size = output_size
+        self.data_folder = data_folder
+        self.transforms = transforms
+        self.annotation = annotation
+        self.col_lbl = col_lbl
+        self.c = num_classes
+        self.mode = "files"
+        self.memmap_filename = memmap_filename
+        if memmap_filename is not None:
+            self.mode = "memmap"
+            memmap_meta = np.load(memmap_filename.parent / (memmap_filename.stem + "_meta.npz"))
+            self.memmap_start = memmap_meta["start"]
+            self.memmap_shape = tuple(memmap_meta["shape"])
+            self.memmap_length = memmap_meta["length"]
+            self.memmap_dtype = np.dtype(str(memmap_meta["dtype"]))
+            self.memmap_file_process_dict = {}
+            if annotation:
+                memmap_meta_label = np.load(memmap_filename.parent / (memmap_filename.stem + "_label_meta.npz"))
+                self.memmap_filename_label = memmap_filename.parent / (memmap_filename.stem + "_label.npy")
+                self.memmap_shape_label = tuple(memmap_meta_label["shape"])
+                self.memmap_file_process_dict_label = {}
+                self.memmap_dtype_label = np.dtype(str(memmap_meta_label["dtype"]))
+        elif npy_data is not None:
+            self.mode = "npy"
+            if isinstance(npy_data, np.ndarray) or isinstance(npy_data, list):
+                self.npy_data = np.array(npy_data)
+                assert (annotation is False)
+            else:
+                self.npy_data = np.load(npy_data)
+            if annotation:
+                self.npy_data_label = np.load(npy_data.parent / (npy_data.stem + "_label.npy"))
+        self.random_crop = random_crop
+        self.df_idx_mapping = []
+        self.start_idx_mapping = []
+        self.end_idx_mapping = []
+        for df_idx, (id, row) in enumerate(df.iterrows()):
+            if self.mode == "files":
+                data_length = row["data_length"]
+            elif self.mode == "memmap":
+                data_length = self.memmap_length[row["data"]]
+            else:  # npy
+                data_length = len(self.npy_data[row["data"]])
+            if chunk_length == 0:  # do not split
+                idx_start = [start_idx]
+                idx_end = [data_length]
+            else:
+                idx_start = list(range(start_idx, data_length, chunk_length if stride is None else stride))
+                idx_end = [min(l + chunk_length, data_length) for l in idx_start]
+            # remove final chunk(s) if too short
+            for i in range(len(idx_start)):
+                if idx_end[i] - idx_start[i] < min_chunk_length:
+                    del idx_start[i:]
+                    del idx_end[i:]
+                    break
+            # append to lists
+            for _ in range(copies + 1):
+                for i_s, i_e in zip(idx_start, idx_end):
+                    self.df_idx_mapping.append(df_idx)
+                    self.start_idx_mapping.append(i_s)
+                    self.end_idx_mapping.append(i_e)
+    def __len__(self):
+        return len(self.df_idx_mapping)
+    def __getitem__(self, idx):
+        df_idx = self.df_idx_mapping[idx]
+        start_idx = self.start_idx_mapping[idx]
+        end_idx = self.end_idx_mapping[idx]
+        # determine crop idxs
+        timesteps = end_idx - start_idx
+        assert (timesteps >= self.output_size)
+        if self.random_crop:  # random crop
+            if timesteps == self.output_size:
+                start_idx_crop = start_idx
+            else:
+                start_idx_crop = start_idx + random.randint(0, timesteps - self.output_size - 1)  # np.random.randint(0, timesteps - self.output_size)
+        else:
+            start_idx_crop = start_idx + (timesteps - self.output_size) // 2
+        end_idx_crop = start_idx_crop + self.output_size
+        # print(idx,start_idx,end_idx,start_idx_crop,end_idx_crop)
+        # load the actual data
+        if self.mode == "files":  # from separate files
+            data_filename = self.timeseries_df.iloc[df_idx]["data"]
+            if self.data_folder is not None:
+                data_filename = self.data_folder / data_filename
+            data = np.load(data_filename)[
+                   start_idx_crop:end_idx_crop]  # data type has to be adjusted when saving to npy
+            ID = data_filename.stem
+            if self.annotation is True:
+                label_filename = self.timeseries_df.iloc[df_idx][self.col_lbl]
+                if self.data_folder is not None:
+                    label_filename = self.data_folder / label_filename
+                label = np.load(label_filename)[
+                        start_idx_crop:end_idx_crop]  # data type has to be adjusted when saving to npy
+            else:
+                label = self.timeseries_df.iloc[df_idx][self.col_lbl]  # input type has to be adjusted in the dataframe
+        elif self.mode == "memmap":  # from one memmap file
+            ID = self.timeseries_df.iloc[df_idx]["data_original"].stem
+            memmap_idx = self.timeseries_df.iloc[df_idx][
+                "data"]  # grab the actual index (Note the df to create the ds might be a subset of the original df used to create the memmap)
+            idx_offset = self.memmap_start[memmap_idx]
+            pid = os.getpid()
+            # print("idx",idx,"ID",ID,"idx_offset",idx_offset,"start_idx_crop",start_idx_crop,"df_idx", self.df_idx_mapping[idx],"pid",pid)
+            mem_file = self.memmap_file_process_dict.get(pid, None)  # each process owns its handler.
+            if mem_file is None:
+                # print("memmap_shape", self.memmap_shape)
+                mem_file = np.memmap(self.memmap_filename, self.memmap_dtype, mode='r', shape=self.memmap_shape)
+                self.memmap_file_process_dict[pid] = mem_file
+            data = np.copy(mem_file[idx_offset + start_idx_crop: idx_offset + end_idx_crop])
+            # print(mem_file[idx_offset + start_idx_crop: idx_offset + end_idx_crop])
+            if self.annotation:
+                mem_file_label = self.memmap_file_process_dict_label.get(pid, None)  # each process owns its handler.
+                if mem_file_label is None:
+                    mem_file_label = np.memmap(self.memmap_filename_label, self.memmap_dtype, mode='r',
+                                               shape=self.memmap_shape_label)
+                    self.memmap_file_process_dict_label[pid] = mem_file_label
+                label = np.copy(mem_file_label[idx_offset + start_idx_crop: idx_offset + end_idx_crop])
+            else:
+                label = self.timeseries_df.iloc[df_idx][self.col_lbl]
+        else:  # single npy array
+            ID = self.timeseries_df.iloc[df_idx]["data"]
+            data = self.npy_data[ID][start_idx_crop:end_idx_crop]
+            if self.annotation:
+                label = self.npy_data_label[ID][start_idx_crop:end_idx_crop]
+            else:
+                label = self.timeseries_df.iloc[df_idx][self.col_lbl]
+        sample = {'data': data, 'label': label, 'ID': ID}
+        for t in self.transforms:
+            sample = t(sample)
+        return sample
+    def get_sampling_weights(self, class_weight_dict, length_weighting=False, group_by_col=None):
+        assert (self.annotation is False)
+        assert (length_weighting is False or group_by_col is None)
+        weights = np.zeros(len(self.df_idx_mapping), dtype=np.float32)
+        length_per_class = {}
+        length_per_group = {}
+        for iw, (i, s, e) in enumerate(zip(self.df_idx_mapping, self.start_idx_mapping, self.end_idx_mapping)):
+            label = self.timeseries_df.iloc[i][self.col_lbl]
+            weight = class_weight_dict[label]
+            if length_weighting:
+                if label in length_per_class.keys():
+                    length_per_class[label] += e - s
+                else:
+                    length_per_class[label] = e - s
+            if group_by_col is not None:
+                group = self.timeseries_df.iloc[i][group_by_col]
+                if group in length_per_group.keys():
+                    length_per_group[group] += e - s
+                else:
+                    length_per_group[group] = e - s
+            weights[iw] = weight
+        if length_weighting:  # need second pass to properly take into account the total length per class
+            for iw, (i, s, e) in enumerate(zip(self.df_idx_mapping, self.start_idx_mapping, self.end_idx_mapping)):
+                label = self.timeseries_df.iloc[i][self.col_lbl]
+                weights[iw] = (e - s) / length_per_class[label] * weights[iw]
+        if group_by_col is not None:
+            for iw, (i, s, e) in enumerate(zip(self.df_idx_mapping, self.start_idx_mapping, self.end_idx_mapping)):
+                group = self.timeseries_df.iloc[i][group_by_col]
+                weights[iw] = (e - s) / length_per_group[group] * weights[iw]
+        weights = weights / np.min(weights)  # normalize smallest weight to 1
+        return weights
+    def get_id_mapping(self):
+        return self.df_idx_mapping
+class RandomCrop(object):
+    """
+    Crop randomly the image in a sample (deprecated).
+    """
+    def __init__(self, output_size, annotation=False):
+        self.output_size = output_size
+        self.annotation = annotation
+    def __call__(self, sample):
+        data, label, ID = sample['data'], sample['label'], sample['ID']
+        timesteps = len(data)
+        assert (timesteps >= self.output_size)
+        if timesteps == self.output_size:
+            start = 0
+        else:
+            start = random.randint(0, timesteps - self.output_size - 1)  # np.random.randint(0, timesteps - self.output_size)
+        data = data[start: start + self.output_size]
+        if self.annotation:
+            label = label[start: start + self.output_size]
+        return {'data': data, 'label': label, "ID": ID}
+class CenterCrop(object):
+    """
+    Center crop the image in a sample (deprecated).
+    """
+    def __init__(self, output_size, annotation=False):
+        self.output_size = output_size
+        self.annotation = annotation
+    def __call__(self, sample):
+        data, label, ID = sample['data'], sample['label'], sample['ID']
+        timesteps = len(data)
+        start = (timesteps - self.output_size) // 2
+        data = data[start: start + self.output_size]
+        if self.annotation:
+            label = label[start: start + self.output_size]
+        return {'data': data, 'label': label, "ID": ID}
+class GaussianNoise(object):
+    """
+    Add gaussian noise to sample.
+    """
+    def __init__(self, scale=0.1):
+        self.scale = scale
+    def __call__(self, sample):
+        if self.scale == 0:
+            return sample
+        else:
+            data, label, ID = sample['data'], sample['label'], sample['ID']
+            data = data + np.reshape(np.array([random.gauss(0, self.scale) for _ in range(np.prod(data.shape))]),
+                                     data.shape)  # np.random.normal(scale=self.scale,size=data.shape).astype(np.float32)
+            return {'data': data, 'label': label, "ID": ID}
+class Rescale(object):
+    """Rescale by factor.
+    """
+    def __init__(self, scale=0.5, interpolation_order=3):
+        self.scale = scale
+        self.interpolation_order = interpolation_order
+    def __call__(self, sample):
+        if self.scale == 1:
+            return sample
+        else:
+            data, label, ID = sample['data'], sample['label'], sample['ID']
+            timesteps_new = int(self.scale * len(data))
+            data = transform.resize(data, (timesteps_new, data.shape[1]), order=self.interpolation_order).astype(
+                np.float32)
+            return {'data': data, 'label': label, "ID": ID}
+class ToTensor(object):
+    """Convert ndarrays in sample to Tensors."""
+    def __init__(self, transpose_data1d=True):
+        self.transpose_data1d = transpose_data1d
+    def __call__(self, sample):
+        def _to_tensor(data, transpose_data1d=False):
+            if (
+                    len(data.shape) == 2 and transpose_data1d is True):  # swap channel and time axis for direct application of pytorch's 1d convs
+                data = data.transpose((1, 0))
+            if isinstance(data, np.ndarray):
+                return torch.from_numpy(data)
+            else:  # default_collate will take care of it
+                return data
+        data, label, ID = sample['data'], sample['label'], sample['ID']
+        if not isinstance(data, tuple):
+            data = _to_tensor(data, self.transpose_data1d)
+        else:
+            data = tuple(_to_tensor(x, self.transpose_data1d) for x in data)
+        if not isinstance(label, tuple):
+            label = _to_tensor(label)
+        else:
+            label = tuple(_to_tensor(x) for x in label)
+        return data, label  # returning as a tuple (potentially of lists)
+class Normalize(object):
+    """
+    Normalize using given stats.
+    """
+    def __init__(self, stats_mean, stats_std, input=True, channels=None):
+        if channels is None:
+            channels = []
+        self.stats_mean = np.expand_dims(stats_mean.astype(np.float32), axis=0) if stats_mean is not None else None
+        self.stats_std = np.expand_dims(stats_std.astype(np.float32), axis=0) + 1e-8 if stats_std is not None else None
+        self.input = input
+        if len(channels) > 0:
+            for i in range(len(stats_mean)):
+                if not (i in channels):
+                    self.stats_mean[:, i] = 0
+                    self.stats_std[:, i] = 1
+    def __call__(self, sample):
+        if self.input:
+            data = sample['data']
+        else:
+            data = sample['label']
+        if self.stats_mean is not None:
+            data = data - self.stats_mean
+        if self.stats_std is not None:
+            data = data / self.stats_std
+        if self.input:
+            return {'data': data, 'label': sample['label'], "ID": sample['ID']}
+        else:
+            return {'data': sample['data'], 'label': data, "ID": sample['ID']}
+class ButterFilter(object):
+    """
+    Normalize using given stats.
+    """
+    def __init__(self, lowcut=50, highcut=50, fs=100, order=5, btype='band', forwardbackward=True, input=True):
+        self.filter = butter_filter(lowcut, highcut, fs, order, btype)
+        self.input = input
+        self.forwardbackward = forwardbackward
+    def __call__(self, sample):
+        if self.input:
+            data = sample['data']
+        else:
+            data = sample['label']
+        # check multiple axis
+        if self.forwardbackward:
+            data = sosfiltfilt(self.filter, data, axis=0)
+        else:
+            data = sosfilt(self.filter, data, axis=0)
+        if self.input:
+            return {'data': data, 'label': sample['label'], "ID": sample['ID']}
+        else:
+            return {'data': sample['data'], 'label': data, "ID": sample['ID']}
+class ChannelFilter(object):
+    """
+    Select certain channels.
+    """
+    def __init__(self, channels=None, input=True):
+        if channels is None:
+            channels = [0]
+        self.channels = channels
+        self.input = input
+    def __call__(self, sample):
+        if self.input:
+            return {'data': sample['data'][:, self.channels], 'label': sample['label'], "ID": sample['ID']}
+        else:
+            return {'data': sample['data'], 'label': sample['label'][:, self.channels], "ID": sample['ID']}
+class Transform(object):
+    """
+    Transforms data using a given function i.e. data_new = func(data) for input is True else label_new = func(label)
+    """
+    def __init__(self, func, input=False):
+        self.func = func
+        self.input = input
+    def __call__(self, sample):
+        if self.input:
+            return {'data': self.func(sample['data']), 'label': sample['label'], "ID": sample['ID']}
+        else:
+            return {'data': sample['data'], 'label': self.func(sample['label']), "ID": sample['ID']}
+class TupleTransform(object):
+    """
+    Transforms data using a given function (operating on both data and label and return a tuple) i.e. data_new, label_new = func(data_old, label_old)
+    """
+    def __init__(self, func, input=False):
+        self.func = func
+    def __call__(self, sample):
+        data_new, label_new = self.func(sample['data'], sample['label'])
+        return {'data': data_new, 'label': label_new, "ID": sample['ID']}
+# MIL and ensemble models
+def aggregate_predictions(preds, targs=None, idmap=None, aggregate_fn=np.mean, verbose=True):
+    """
+    aggregates potentially multiple predictions per sample (can also pass targs for convenience)
+    idmap: idmap as returned by TimeSeriesCropsDataset's get_id_mapping
+    preds: ordered predictions as returned by learn.get_preds()
+    aggregate_fn: function that is used to aggregate multiple predictions per sample (most commonly np.amax or np.mean)
+    """
+    if idmap is not None and len(idmap) != len(np.unique(idmap)):
+        if verbose:
+            print("aggregating predictions...")
+            preds_aggregated = []
+            targs_aggregated = []
+            for i in np.unique(idmap):
+                preds_local = preds[np.where(idmap == i)[0]]
+                preds_aggregated.append(aggregate_fn(preds_local, axis=0))
+                if targs is not None:
+                    targs_local = targs[np.where(idmap == i)[0]]
+                    assert (np.all(targs_local == targs_local[0]))  # all labels have to agree
+                    targs_aggregated.append(targs_local[0])
+            if targs is None:
+                return np.array(preds_aggregated)
+            else:
+                return np.array(preds_aggregated), np.array(targs_aggregated)
+    else:
+        if targs is None:
+            return preds
+        else:
+            return preds, targs
+class milwrapper(nn.Module):
+    def __init__(self, model, input_size, n, stride=None, softmax=True):
+        super().__init__()
+        self.n = n
+        self.input_size = input_size
+        self.model = model
+        self.softmax = softmax
+        self.stride = input_size if stride is None else stride
+    def forward(self, x):
+        # bs,ch,seq
+        for i in range(self.n):
+            pred_single = self.model(x[:, :, i * self.stride:i * self.stride + self.input_size])
+            pred_single = nn.functional.softmax(pred_single, dim=1)
+            if i == 0:
+                pred = pred_single
+            else:
+                pred += pred_single
+        return pred / self.n
+class ensemblewrapper(nn.Module):
+    def __init__(self, model, checkpts):
+        super().__init__()
+        self.model = model
+        self.checkpts = checkpts
+    def forward(self, x):
+        # bs,ch,seq
+        for i, c in enumerate(self.checkpts):
+            state = torch.load(Path("./models/") / f'{c}.pth', map_location=x.device)
+            self.model.load_state_dict(state['model'], strict=True)
+            pred_single = self.model(x)
+            pred_single = nn.functional.softmax(pred_single, dim=1)
+            if (i == 0):
+                pred = pred_single
+            else:
+                pred += pred_single
+        return pred / len(self.checkpts)

utilities/utils.py ADDED Viewed

	@@ -0,0 +1,509 @@

+import os
+import glob
+import pickle
+import pandas as pd
+import numpy as np
+from tqdm import tqdm
+import wfdb
+import ast
+from sklearn.metrics import roc_auc_score, roc_curve
+from sklearn.preprocessing import StandardScaler, MultiLabelBinarizer
+# EVALUATION STUFF
+def generate_results(idxs, y_true, y_pred, thresholds):
+    return evaluate_experiment(y_true[idxs], y_pred[idxs], thresholds)
+def evaluate_experiment(y_true, y_pred, thresholds=None):
+    results = {}
+    if not thresholds is None:
+        # binary predictions
+        y_pred_binary = apply_thresholds(y_pred, thresholds)
+        # PhysioNet/CinC Challenges metrics
+        challenge_scores = challenge_metrics(y_true, y_pred_binary, beta1=2, beta2=2)
+        results['F_beta_macro'] = challenge_scores['F_beta_macro']
+        results['G_beta_macro'] = challenge_scores['G_beta_macro']
+        results['TP'] = challenge_scores['TP']
+        results['TN'] = challenge_scores['TN']
+        results['FP'] = challenge_scores['FP']
+        results['FN'] = challenge_scores['FN']
+        results['Accuracy'] = challenge_scores['Accuracy']
+        results['F1'] = challenge_scores['F1']
+        results['Precision'] = challenge_scores['Precision']
+        results['Recall'] = challenge_scores['Recall']
+    # label based metric
+    results['macro_auc'] = roc_auc_score(y_true, y_pred, average='macro')
+    df_result = pd.DataFrame(results, index=[0])
+    return df_result
+def challenge_metrics(y_true, y_pred, beta1=2, beta2=2, single=False):
+    f_beta = 0
+    g_beta = 0
+    TP, FP, TN, FN = 0., 0., 0., 0.
+    Accuracy = 0
+    Precision = 0
+    Recall = 0
+    F1 = 0
+    if single:  # if evaluating single class in case of threshold-optimization
+        sample_weights = np.ones(y_true.sum(axis=1).shape)
+    else:
+        sample_weights = y_true.sum(axis=1)
+    for classi in range(y_true.shape[1]):
+        y_truei, y_predi = y_true[:, classi], y_pred[:, classi]
+        TP, FP, TN, FN = 0., 0., 0., 0.
+        for i in range(len(y_predi)):
+            sample_weight = sample_weights[i]
+            if y_truei[i] == y_predi[i] == 1:
+                TP += 1. / sample_weight
+            if (y_predi[i] == 1) and (y_truei[i] != y_predi[i]):
+                FP += 1. / sample_weight
+            if y_truei[i] == y_predi[i] == 0:
+                TN += 1. / sample_weight
+            if (y_predi[i] == 0) and (y_truei[i] != y_predi[i]):
+                FN += 1. / sample_weight
+        f_beta_i = ((1 + beta1 ** 2) * TP) / ((1 + beta1 ** 2) * TP + FP + (beta1 ** 2) * FN)
+        g_beta_i = TP / (TP + FP + beta2 * FN)
+        f_beta += f_beta_i
+        g_beta += g_beta_i
+        Accuracy = (TP + TN) / (FP + TP + TN + FN)
+        # Precision = TP / (TP + FP)
+        # Recall = TP / (TP + FN)
+        # F1 =  2*(Precision * Recall) / (Precision + Recall)
+        F1 = 2 * TP / 2 * TP + FP + FN
+    return {'F_beta_macro': f_beta / y_true.shape[1], 'G_beta_macro': g_beta / y_true.shape[1], 'TP': TP, 'FP': FP,
+            'TN': TN, 'FN': FN, 'Accuracy': Accuracy, 'F1': F1, 'Precision': Precision, 'Recall': Recall}
+def get_appropriate_bootstrap_samples(y_true, n_bootstraping_samples):
+    samples = []
+    while True:
+        ridxs = np.random.randint(0, len(y_true), len(y_true))
+        if y_true[ridxs].sum(axis=0).min() != 0:
+            samples.append(ridxs)
+            if len(samples) == n_bootstraping_samples:
+                break
+    return samples
+def find_optimal_cutoff_threshold(target, predicted):
+    """
+    Find the optimal probability cutoff point for a classification model related to event rate
+    """
+    fpr, tpr, threshold = roc_curve(target, predicted)
+    optimal_idx = np.argmax(tpr - fpr)
+    optimal_threshold = threshold[optimal_idx]
+    return optimal_threshold
+def find_optimal_cutoff_thresholds(y_true, y_pred):
+    return [find_optimal_cutoff_threshold(y_true[:, i], y_pred[:, i]) for i in range(y_true.shape[1])]
+def find_optimal_cutoff_threshold_for_Gbeta(target, predicted, n_thresholds=100):
+    thresholds = np.linspace(0.00, 1, n_thresholds)
+    scores = [challenge_metrics(target, predicted > t, single=True)['G_beta_macro'] for t in thresholds]
+    optimal_idx = np.argmax(scores)
+    return thresholds[optimal_idx]
+def find_optimal_cutoff_thresholds_for_Gbeta(y_true, y_pred):
+    print("optimize thresholds with respect to G_beta")
+    return [
+        find_optimal_cutoff_threshold_for_Gbeta(y_true[:, k][:, np.newaxis], y_pred[:, k][:, np.newaxis])
+        for k in tqdm(range(y_true.shape[1]))]
+def apply_thresholds(preds, thresholds):
+    """
+        apply class-wise thresholds to prediction score in order to get binary format.
+        BUT: if no score is above threshold, pick maximum. This is needed due to metric issues.
+    """
+    tmp = []
+    for p in preds:
+        tmp_p = (p > thresholds).astype(int)
+        if np.sum(tmp_p) == 0:
+            tmp_p[np.argmax(p)] = 1
+        tmp.append(tmp_p)
+    tmp = np.array(tmp)
+    return tmp
+# DATA PROCESSING STUFF
+def load_dataset(path, sampling_rate, release=False):
+    if path.split('/')[-2] == 'ptbxl':
+        # load and convert annotation data
+        Y = pd.read_csv(path + 'ptbxl_database.csv', index_col='ecg_id')
+        Y.scp_codes = Y.scp_codes.apply(lambda x: ast.literal_eval(x))
+        # Load raw signal data
+        X = load_raw_data_ptbxl(Y, sampling_rate, path)
+    elif path.split('/')[-2] == 'ICBEB':
+        # load and convert annotation data
+        Y = pd.read_csv(path + 'icbeb_database.csv', index_col='ecg_id')
+        Y.scp_codes = Y.scp_codes.apply(lambda x: ast.literal_eval(x))
+        # Load raw signal data
+        X = load_raw_data_icbeb(Y, sampling_rate, path)
+    return X, Y
+def load_raw_data_icbeb(df, sampling_rate, path):
+    if sampling_rate == 100:
+        if os.path.exists(path + 'raw100.npy'):
+            data = np.load(path + 'raw100.npy', allow_pickle=True)
+        else:
+            data = [wfdb.rdsamp(path + 'records100/' + str(f)) for f in tqdm(df.index)]
+            data = np.array([signal for signal, meta in data])
+            pickle.dump(data, open(path + 'raw100.npy', 'wb'), protocol=4)
+    elif sampling_rate == 500:
+        if os.path.exists(path + 'raw500.npy'):
+            data = np.load(path + 'raw500.npy', allow_pickle=True)
+        else:
+            data = [wfdb.rdsamp(path + 'records500/' + str(f)) for f in tqdm(df.index)]
+            data = np.array([signal for signal, meta in data])
+            pickle.dump(data, open(path + 'raw500.npy', 'wb'), protocol=4)
+    return data
+def load_raw_data_ptbxl(df, sampling_rate, path):
+    if sampling_rate == 100:
+        if os.path.exists(path + 'raw100.npy'):
+            data = np.load(path + 'raw100.npy', allow_pickle=True)
+        else:
+            data = [wfdb.rdsamp(path + f) for f in tqdm(df.filename_lr)]
+            data = np.array([signal for signal, meta in data])
+            pickle.dump(data, open(path + 'raw100.npy', 'wb'), protocol=4)
+    elif sampling_rate == 500:
+        if os.path.exists(path + 'raw500.npy'):
+            data = np.load(path + 'raw500.npy', allow_pickle=True)
+        else:
+            data = [wfdb.rdsamp(path + f) for f in tqdm(df.filename_hr)]
+            data = np.array([signal for signal, meta in data])
+            pickle.dump(data, open(path + 'raw500.npy', 'wb'), protocol=4)
+    return data
+def compute_label_aggregations(df, folder, ctype):
+    df['scp_codes_len'] = df.scp_codes.apply(lambda x: len(x))
+    aggregation_df = pd.read_csv(folder + 'scp_statements.csv', index_col=0)
+    if ctype in ['diagnostic', 'subdiagnostic', 'superdiagnostic']:
+        def aggregate_all_diagnostic(y_dic):
+            tmp = []
+            for key in y_dic.keys():
+                if key in diag_agg_df.index:
+                    tmp.append(key)
+            return list(set(tmp))
+        def aggregate_subdiagnostic(y_dic):
+            tmp = []
+            for key in y_dic.keys():
+                if key in diag_agg_df.index:
+                    c = diag_agg_df.loc[key].diagnostic_subclass
+                    if str(c) != 'nan':
+                        tmp.append(c)
+            return list(set(tmp))
+        def aggregate_diagnostic(y_dic):
+            tmp = []
+            for key in y_dic.keys():
+                if key in diag_agg_df.index:
+                    c = diag_agg_df.loc[key].diagnostic_class
+                    if str(c) != 'nan':
+                        tmp.append(c)
+            return list(set(tmp))
+        diag_agg_df = aggregation_df[aggregation_df.diagnostic == 1.0]
+        if ctype == 'diagnostic':
+            df['diagnostic'] = df.scp_codes.apply(aggregate_all_diagnostic)
+            df['diagnostic_len'] = df.diagnostic.apply(lambda x: len(x))
+        elif ctype == 'subdiagnostic':
+            df['subdiagnostic'] = df.scp_codes.apply(aggregate_subdiagnostic)
+            df['subdiagnostic_len'] = df.subdiagnostic.apply(lambda x: len(x))
+        elif ctype == 'superdiagnostic':
+            df['superdiagnostic'] = df.scp_codes.apply(aggregate_diagnostic)
+            df['superdiagnostic_len'] = df.superdiagnostic.apply(lambda x: len(x))
+    elif ctype == 'form':
+        form_agg_df = aggregation_df[aggregation_df.form == 1.0]
+        def aggregate_form(y_dic):
+            tmp = []
+            for key in y_dic.keys():
+                if key in form_agg_df.index:
+                    c = key
+                    if str(c) != 'nan':
+                        tmp.append(c)
+            return list(set(tmp))
+        df['form'] = df.scp_codes.apply(aggregate_form)
+        df['form_len'] = df.form.apply(lambda x: len(x))
+    elif ctype == 'rhythm':
+        rhythm_agg_df = aggregation_df[aggregation_df.rhythm == 1.0]
+        def aggregate_rhythm(y_dic):
+            tmp = []
+            for key in y_dic.keys():
+                if key in rhythm_agg_df.index:
+                    c = key
+                    if str(c) != 'nan':
+                        tmp.append(c)
+            return list(set(tmp))
+        df['rhythm'] = df.scp_codes.apply(aggregate_rhythm)
+        df['rhythm_len'] = df.rhythm.apply(lambda x: len(x))
+    elif ctype == 'all':
+        df['all_scp'] = df.scp_codes.apply(lambda x: list(set(x.keys())))
+    return df
+def select_data(XX, YY, ctype, min_samples, output_folder):
+    # convert multi_label to multi-hot
+    mlb = MultiLabelBinarizer()
+    if ctype == 'diagnostic':
+        X = XX[YY.diagnostic_len > 0]
+        Y = YY[YY.diagnostic_len > 0]
+        mlb.fit(Y.diagnostic.values)
+        y = mlb.transform(Y.diagnostic.values)
+    elif ctype == 'subdiagnostic':
+        counts = pd.Series(np.concatenate(YY.subdiagnostic.values)).value_counts()
+        counts = counts[counts > min_samples]
+        YY.subdiagnostic = YY.subdiagnostic.apply(lambda x: list(set(x).intersection(set(counts.index.values))))
+        YY['subdiagnostic_len'] = YY.subdiagnostic.apply(lambda x: len(x))
+        X = XX[YY.subdiagnostic_len > 0]
+        Y = YY[YY.subdiagnostic_len > 0]
+        mlb.fit(Y.subdiagnostic.values)
+        y = mlb.transform(Y.subdiagnostic.values)
+    elif ctype == 'superdiagnostic':
+        counts = pd.Series(np.concatenate(YY.superdiagnostic.values)).value_counts()
+        counts = counts[counts > min_samples]
+        YY.superdiagnostic = YY.superdiagnostic.apply(lambda x: list(set(x).intersection(set(counts.index.values))))
+        YY['superdiagnostic_len'] = YY.superdiagnostic.apply(lambda x: len(x))
+        X = XX[YY.superdiagnostic_len > 0]
+        Y = YY[YY.superdiagnostic_len > 0]
+        mlb.fit(Y.superdiagnostic.values)
+        y = mlb.transform(Y.superdiagnostic.values)
+    elif ctype == 'form':
+        # filter
+        counts = pd.Series(np.concatenate(YY.form.values)).value_counts()
+        counts = counts[counts > min_samples]
+        YY.form = YY.form.apply(lambda x: list(set(x).intersection(set(counts.index.values))))
+        YY['form_len'] = YY.form.apply(lambda x: len(x))
+        # select
+        X = XX[YY.form_len > 0]
+        Y = YY[YY.form_len > 0]
+        mlb.fit(Y.form.values)
+        y = mlb.transform(Y.form.values)
+    elif ctype == 'rhythm':
+        # filter
+        counts = pd.Series(np.concatenate(YY.rhythm.values)).value_counts()
+        counts = counts[counts > min_samples]
+        YY.rhythm = YY.rhythm.apply(lambda x: list(set(x).intersection(set(counts.index.values))))
+        YY['rhythm_len'] = YY.rhythm.apply(lambda x: len(x))
+        # select
+        X = XX[YY.rhythm_len > 0]
+        Y = YY[YY.rhythm_len > 0]
+        mlb.fit(Y.rhythm.values)
+        y = mlb.transform(Y.rhythm.values)
+    elif ctype == 'all':
+        # filter
+        counts = pd.Series(np.concatenate(YY.all_scp.values)).value_counts()
+        counts = counts[counts > min_samples]
+        YY.all_scp = YY.all_scp.apply(lambda x: list(set(x).intersection(set(counts.index.values))))
+        YY['all_scp_len'] = YY.all_scp.apply(lambda x: len(x))
+        # select
+        X = XX[YY.all_scp_len > 0]
+        Y = YY[YY.all_scp_len > 0]
+        mlb.fit(Y.all_scp.values)
+        y = mlb.transform(Y.all_scp.values)
+    else:
+        pass
+    # save Label_Binarizer
+    with open(output_folder + 'mlb.pkl', 'wb') as tokenizer:
+        pickle.dump(mlb, tokenizer)
+    return X, Y, y, mlb
+def preprocess_signals(X_train, X_validation, X_test, outputfolder):
+    # Standardize data such that mean 0 and variance 1
+    ss = StandardScaler()
+    ss.fit(np.vstack(X_train).flatten()[:, np.newaxis].astype(float))
+    # Save Standardize data
+    with open(outputfolder + 'standard_scaler.pkl', 'wb') as ss_file:
+        pickle.dump(ss, ss_file)
+    return apply_standardizer(X_train, ss), apply_standardizer(X_validation,
+                                                               ss), apply_standardizer(
+        X_test, ss)
+def apply_standardizer(X, ss):
+    X_tmp = []
+    for x in X:
+        x_shape = x.shape
+        X_tmp.append(ss.transform(x.flatten()[:, np.newaxis]).reshape(x_shape))
+    X_tmp = np.array(X_tmp)
+    return X_tmp
+# DOCUMENTATION STUFF
+def generate_ptbxl_summary_table(selection=None, folder='/output/'):
+    exps = ['exp0', 'exp1', 'exp1.1', 'exp1.1.1', 'exp2', 'exp3']
+    metrics = ['macro_auc', 'Accuracy', 'TP', 'TN', 'FP', 'FN', 'Precision', 'Recall', 'F1']
+    #     0            1        2     3     4    5        6          7        8
+    # get models
+    models = {}
+    for i, exp in enumerate(exps):
+        if selection is None:
+            exp_models = [m.split('/')[-1] for m in glob.glob(folder + str(exp) + '/models/*')]
+        else:
+            exp_models = selection
+        if i == 0:
+            models = set(exp_models)
+        else:
+            models = models.union(set(exp_models))
+    results_dic = {'Method': [],
+                   'exp0_macro_auc': [],
+                   'exp1_macro_auc': [],
+                   'exp1.1_macro_auc': [],
+                   'exp1.1.1_macro_auc': [],
+                   'exp2_macro_auc': [],
+                   'exp3_macro_auc': [],
+                   'exp0_Accuracy': [],
+                   'exp1_Accuracy': [],
+                   'exp1.1_Accuracy': [],
+                   'exp1.1.1_Accuracy': [],
+                   'exp2_Accuracy': [],
+                   'exp3_Accuracy': [],
+                   'exp0_F1': [],
+                   'exp1_F1': [],
+                   'exp1.1_F1': [],
+                   'exp1.1.1_F1': [],
+                   'exp2_F1': [],
+                   'exp3_F1': [],
+                   'exp0_Precision': [],
+                   'exp1_Precision': [],
+                   'exp1.1_Precision': [],
+                   'exp1.1.1_Precision': [],
+                   'exp2_Precision': [],
+                   'exp3_Precision': [],
+                   'exp0_Recall': [],
+                   'exp1_Recall': [],
+                   'exp1.1_Recall': [],
+                   'exp1.1.1_Recall': [],
+                   'exp2_Recall': [],
+                   'exp3_Recall': [],
+                   'exp0_TP': [],
+                   'exp1_TP': [],
+                   'exp1.1_TP': [],
+                   'exp1.1.1_TP': [],
+                   'exp2_TP': [],
+                   'exp3_TP': [],
+                   'exp0_TN': [],
+                   'exp1_TN': [],
+                   'exp1.1_TN': [],
+                   'exp1.1.1_TN': [],
+                   'exp2_TN': [],
+                   'exp3_TN': [],
+                   'exp0_FP': [],
+                   'exp1_FP': [],
+                   'exp1.1_FP': [],
+                   'exp1.1.1_FP': [],
+                   'exp2_FP': [],
+                   'exp3_FP': [],
+                   'exp0_FN': [],
+                   'exp1_FN': [],
+                   'exp1.1_FN': [],
+                   'exp1.1.1_FN': [],
+                   'exp2_FN': [],
+                   'exp3_FN': []
+                   }
+    for m in models:
+        results_dic['Method'].append(m)
+        for e in exps:
+            try:
+                me_res = pd.read_csv(folder + str(e) + '/models/' + str(m) + '/results/te_results.csv', index_col=0)
+                mean1 = me_res.loc['point'][metrics[0]]
+                unc1 = max(me_res.loc['upper'][metrics[0]] - me_res.loc['point'][metrics[0]],
+                           me_res.loc['point'][metrics[0]] - me_res.loc['lower'][metrics[0]])
+                acc = me_res.loc['point'][metrics[1]]
+                f1 = me_res.loc['point'][metrics[8]]
+                precision = me_res.loc['point'][metrics[6]]
+                recall = me_res.loc['point'][metrics[7]]
+                tp = me_res.loc['point'][metrics[2]]
+                tn = me_res.loc['point'][metrics[3]]
+                fp = me_res.loc['point'][metrics[4]]
+                fn = me_res.loc['point'][metrics[5]]
+                results_dic[e + '_macro_auc'].append("%.3f(%.2d)" % (np.round(mean1, 3), int(unc1 * 1000)))
+                results_dic[e + '_Accuracy'].append("%.3f" % acc)
+                results_dic[e + '_F1'].append("%.3f" % f1)
+                results_dic[e + '_Precision'].append("%.3f" % precision)
+                results_dic[e + '_Recall'].append("%.3f" % recall)
+                results_dic[e + '_TP'].append("%.3f" % tp)
+                results_dic[e + '_TN'].append("%.3f" % tn)
+                results_dic[e + '_FP'].append("%.3f" % fp)
+                results_dic[e + '_FN'].append("%.3f" % fn)
+            except FileNotFoundError:
+                results_dic[e + '_macro_auc'].append("--")
+                results_dic[e + '_Accuracy'].append("--")
+                results_dic[e + '_F1'].append("--")
+                results_dic[e + '_Precision'].append("--")
+                results_dic[e + '_Recall'].append("--")
+                results_dic[e + '_TP'].append("--")
+                results_dic[e + '_TN'].append("--")
+                results_dic[e + '_FP'].append("--")
+                results_dic[e + '_FN'].append("--")
+    df = pd.DataFrame(results_dic)
+    df_index = df[df.Method.isin(['naive', 'ensemble'])]
+    df_rest = df[~df.Method.isin(['naive', 'ensemble'])]
+    df = pd.concat([df_rest, df_index])
+    df.to_csv(folder + 'results_ptbxl.csv')
+    titles = [
+        '### 1. PTB-XL: all statements',
+        '### 2. PTB-XL: diagnostic statements',
+        '### 3. PTB-XL: Diagnostic subclasses',
+        '### 4. PTB-XL: Diagnostic superclasses',
+        '### 5. PTB-XL: Form statements',
+        '### 6. PTB-XL: Rhythm statements'
+    ]
+    # helper output function for markdown tables
+    our_work = 'https://arxiv.org/abs/2004.13701'
+    our_repo = 'https://github.com/helme/ecg_ptbxl_benchmarking/'
+    md_source = ''
+    for i, e in enumerate(exps):
+        md_source += '\n ' + titles[i] + ' \n \n'
+        md_source += '|    Model    |    AUC    |\n'
+        for row in df_rest[['Method', e + '_AUC']].sort_values(e + '_AUC', ascending=False).values:
+            md_source += '| ' + row[0].replace('fastai_', '') + ' | ' + row[1] + ' |\n'
+    print(md_source)