GitHub Actions
Deploy v0.0.1 from GitHub Actions
314b374

ML Pipeline

Cardiac risk prediction using an ensemble of ECGFounder (deep learning) and XGBoost (gradient boosting).

Model Architecture

Raw ECG (1000 samples @ 100Hz)
        │
        ├──> ECGFounder (1D CNN)──> 128-dim embedding
        │                                │
        ├──> Feature Engineering ────────>├──> 14 clinical features
        │    (NeuroKit2 + SciPy)         │
        │                                v
        │                          XGBoost Classifier
        │                                │
        └── Ensemble ────────────────────>│──> Risk Score (0.0-1.0)
             (weighted average)               Risk Label

ECGFounder

  • Architecture: 1D convolutional neural network (net1d.py)
  • Input: 1000 ECG samples (10s window at 100Hz), bandpass filtered 0.5-40Hz
  • Output: 128-dimensional feature embedding
  • Model file: ml_models/ecgfounder_best.pt (117 MB)

XGBoost Classifier

  • Input: 14 features (ECG embedding + clinical features)
  • Output: Binary risk probability
  • Model file: ml_models/xgboost_cardiac.joblib (752 KB)

Feature Engineering

The feature_extractor.py module computes clinical features from raw ECG:

Feature Source Description
Heart Rate Input BPM from MAX30100
SpO2 Input Blood oxygen percentage
HRV (SDNN) NeuroKit2 Heart rate variability
HRV (RMSSD) NeuroKit2 Root mean square of successive RR differences
QRS Duration NeuroKit2 Ventricular depolarization time
PR Interval NeuroKit2 Atrial-ventricular conduction
Signal Quality SciPy SNR estimate of ECG signal
Mean RR Computed Average R-R interval
RR Irregularity Computed Coefficient of variation of RR
Age Profile User age (if available)
Sex Profile Binary encoded
BMI Profile Computed from height/weight
Comorbidity Score Profile Count of risk factors
HR Deviation History Z-score vs 24h baseline

Inference Flow

  1. Vitals uploaded to POST /api/v1/vitals
  2. If ECG leads are connected and >= 100 samples: a. Load user health profile from database b. Compute 24h and 7d historical baselines c. Extract features from ECG waveform d. Run ECGFounder for deep features e. Combine and run XGBoost f. Store prediction in predictions collection
  3. Return vitals + prediction in API response

Risk Labels

Score Range Label Description
0.00 - 0.20 normal No concerning patterns
0.20 - 0.40 low Minor irregularities
0.40 - 0.60 moderate Some risk indicators
0.60 - 0.80 elevated Multiple risk factors
0.80 - 1.00 high Significant cardiac risk

Files

ml_src/
├── __init__.py
├── net1d.py               # ECGFounder 1D CNN model definition
└── feature_extractor.py   # Clinical feature extraction

ml_models/
├── ecgfounder_best.pt     # Trained ECGFounder weights (Git LFS)
└── xgboost_cardiac.joblib # Trained XGBoost model (Git LFS)

Dependencies

  • torch (CPU-only, installed from pytorch.org/whl/cpu)
  • xgboost >= 2.0.0
  • neurokit2 >= 0.2.7
  • scipy >= 1.14.1
  • numpy >= 1.26.4
  • joblib >= 1.3.0