Spaces:
Runtime error
Runtime error
| # ML Pipeline | |
| Cardiac risk prediction using an ensemble of ECGFounder (deep learning) and XGBoost (gradient boosting). | |
| ## Model Architecture | |
| ``` | |
| Raw ECG (1000 samples @ 100Hz) | |
| │ | |
| ├──> ECGFounder (1D CNN)──> 128-dim embedding | |
| │ │ | |
| ├──> Feature Engineering ────────>├──> 14 clinical features | |
| │ (NeuroKit2 + SciPy) │ | |
| │ v | |
| │ XGBoost Classifier | |
| │ │ | |
| └── Ensemble ────────────────────>│──> Risk Score (0.0-1.0) | |
| (weighted average) Risk Label | |
| ``` | |
| ### ECGFounder | |
| - **Architecture**: 1D convolutional neural network (`net1d.py`) | |
| - **Input**: 1000 ECG samples (10s window at 100Hz), bandpass filtered 0.5-40Hz | |
| - **Output**: 128-dimensional feature embedding | |
| - **Model file**: `ml_models/ecgfounder_best.pt` (117 MB) | |
| ### XGBoost Classifier | |
| - **Input**: 14 features (ECG embedding + clinical features) | |
| - **Output**: Binary risk probability | |
| - **Model file**: `ml_models/xgboost_cardiac.joblib` (752 KB) | |
| ## Feature Engineering | |
| The `feature_extractor.py` module computes clinical features from raw ECG: | |
| | Feature | Source | Description | | |
| |---------|--------|-------------| | |
| | Heart Rate | Input | BPM from MAX30100 | | |
| | SpO2 | Input | Blood oxygen percentage | | |
| | HRV (SDNN) | NeuroKit2 | Heart rate variability | | |
| | HRV (RMSSD) | NeuroKit2 | Root mean square of successive RR differences | | |
| | QRS Duration | NeuroKit2 | Ventricular depolarization time | | |
| | PR Interval | NeuroKit2 | Atrial-ventricular conduction | | |
| | Signal Quality | SciPy | SNR estimate of ECG signal | | |
| | Mean RR | Computed | Average R-R interval | | |
| | RR Irregularity | Computed | Coefficient of variation of RR | | |
| | Age | Profile | User age (if available) | | |
| | Sex | Profile | Binary encoded | | |
| | BMI | Profile | Computed from height/weight | | |
| | Comorbidity Score | Profile | Count of risk factors | | |
| | HR Deviation | History | Z-score vs 24h baseline | | |
| ## Inference Flow | |
| 1. Vitals uploaded to `POST /api/v1/vitals` | |
| 2. If ECG leads are connected and >= 100 samples: | |
| a. Load user health profile from database | |
| b. Compute 24h and 7d historical baselines | |
| c. Extract features from ECG waveform | |
| d. Run ECGFounder for deep features | |
| e. Combine and run XGBoost | |
| f. Store prediction in `predictions` collection | |
| 3. Return vitals + prediction in API response | |
| ## Risk Labels | |
| | Score Range | Label | Description | | |
| |-------------|-------|-------------| | |
| | 0.00 - 0.20 | normal | No concerning patterns | | |
| | 0.20 - 0.40 | low | Minor irregularities | | |
| | 0.40 - 0.60 | moderate | Some risk indicators | | |
| | 0.60 - 0.80 | elevated | Multiple risk factors | | |
| | 0.80 - 1.00 | high | Significant cardiac risk | | |
| ## Files | |
| ``` | |
| ml_src/ | |
| ├── __init__.py | |
| ├── net1d.py # ECGFounder 1D CNN model definition | |
| └── feature_extractor.py # Clinical feature extraction | |
| ml_models/ | |
| ├── ecgfounder_best.pt # Trained ECGFounder weights (Git LFS) | |
| └── xgboost_cardiac.joblib # Trained XGBoost model (Git LFS) | |
| ``` | |
| ## Dependencies | |
| - `torch` (CPU-only, installed from pytorch.org/whl/cpu) | |
| - `xgboost >= 2.0.0` | |
| - `neurokit2 >= 0.2.7` | |
| - `scipy >= 1.14.1` | |
| - `numpy >= 1.26.4` | |
| - `joblib >= 1.3.0` | |