COPD Open Models β€” Model H (90-Day Exacerbation Prediction)

Model Details

Model H predicts the risk of a COPD exacerbation within 90 days using features derived from NHS EHR datasets and patient-reported outcomes (PROs). It serves as both a production-grade prediction pipeline and a template project for building new COPD prediction models, featuring a reusable 2,000+ line core library (model_h.py) with end-to-end training, calibration, evaluation, and SHAP explainability.

Key Characteristics

  • Comprehensive PRO integration β€” the most detailed PRO feature engineering in the portfolio, covering four instruments: EQ-5D (monthly), MRC dyspnoea (weekly), CAT (daily), and Symptom Diary (daily), each with engagement metrics, score differences, and multi-window aggregations.
  • 13 algorithms screened in the first phase, narrowed to top 3 with Bayesian hyperparameter tuning.
  • Forward validation on 9 months of prospective data (May 2023 – February 2024).
  • Reusable core library (model_h.py) β€” 40+ functions for label setup, feature engineering, model evaluation, calibration, and SHAP explainability.
  • Training code is fully decoupled from cloud infrastructure β€” runs locally with no Azure dependencies.

Note: This repository contains no real patient-level data. All included data files are synthetic or example data for pipeline validation.

Model Type

Traditional tabular ML classifiers (multiple candidate estimators; see "Training Procedure").

Release Notes

  • Phase 1 (current): Models C, E, H published as the initial "COPD Open Models" collection.
  • Phase 2 (planned): Additional models may follow after codebase sanitisation.

Intended Use

This model and code are published as reference implementations for research, education, and benchmarking on COPD prediction tasks.

Intended Users

  • ML practitioners exploring tabular healthcare ML pipelines
  • Researchers comparing feature engineering and evaluation approaches
  • Developers building internal prototypes (non-clinical)

Out-of-Scope Uses

  • Not for clinical decision-making, triage, diagnosis, or treatment planning.
  • Not a substitute for clinical judgement or validated clinical tools.
  • Do not deploy in healthcare settings without an appropriate regulatory, clinical safety, and information governance framework.

Regulatory Considerations (SaMD)

Regulatory status for software depends on the intended purpose expressed in documentation, labelling, and promotional materials. Downstream users integrating or deploying this model should determine whether their implementation qualifies as Software as a Medical Device (SaMD) and identify the legal "manufacturer" responsible for compliance and post-market obligations.


Training Data

  • Source: NHS EHR-derived datasets and Lenus COPD Service PRO data (training performed on controlled datasets; not distributed here).
  • Data available in this repo: Synthetic/example datasets only.
  • Cohort: COPD patients with RECEIVER and Scale-Up cohort membership.
  • Target: Binary β€” ExacWithin3Months (hospital + community exacerbations) or HospExacWithin3Months (hospital only).
  • Configuration: 90-day prediction window, 180-day lookback, 5-fold cross-validation.

Features

Category Features
Demographics Age (binned: <50/50-59/60-69/70-79/80+), Sex_F
Comorbidities AsthmaOverlap (binary), Comorbidities count (binned: None/1-2/3+)
Exacerbation History Hospital and community exac counts in lookback, days since last exac, recency-weighted counts
Spirometry FEV1, FVC, FEV1/FVC ratio β€” max, min, and latest values
Laboratory (20+ tests) MaxLifetime, MinLifetime, Max1Year, Min1Year, Latest values with recency weighting (decay_rate=0.001) β€” WBC, RBC, haemoglobin, haematocrit, platelets, sodium, potassium, creatinine, albumin, glucose, ALT, AST, GGT, bilirubin, ALP, cholesterol, triglycerides, TSH, and more
EQ-5D (monthly) Q1–Q5, total score, latest values, engagement rates, score changes
MRC Dyspnoea (weekly) MRC Score (1–5), latest value, engagement, variations
CAT (daily) Q1–Q8, total score (0–40), latest values, engagement, score differences
Symptom Diary (daily) Q5 rescue medication (binary), weekly aggregates, engagement rates

Data Preprocessing

  1. Target encoding β€” K-fold encoding with smoothing for categorical features (Age, Comorbidities, FEV1 severity, smoking status, etc.).
  2. Imputation β€” median/mean/mode imputation strategies, applied per-fold.
  3. Scaling β€” MinMaxScaler to [0, 1], fit on training fold only.
  4. PRO LOGIC filtering β€” 14-day minimum between exacerbation episodes, 2 consecutive negative Q5 responses required for borderline events (14–35 days apart).

Training Procedure

Training Framework

  • pandas, scikit-learn, imbalanced-learn, xgboost, lightgbm, catboost
  • Hyperparameter tuning: scikit-optimize (BayesSearchCV)
  • Explainability: SHAP (TreeExplainer)
  • Experiment tracking: MLflow

Algorithms Evaluated

First Phase (13 model types):

Algorithm Library
DummyClassifier (baseline) sklearn
Logistic Regression sklearn
Logistic Regression (balanced) sklearn
Random Forest sklearn
Random Forest (balanced) sklearn
Balanced Random Forest imblearn
Balanced Bagging imblearn
XGBoost (7 variants) xgboost
LightGBM (2 variants) lightgbm
CatBoost catboost

Hyperparameter Tuning Search Spaces:

Algorithm Parameters Tuned
Logistic Regression penalty, class_weight, max_iter (50–300), C (0.001–10)
Random Forest max_depth (4–10), n_estimators (70–850), min_samples_split (2–10), class_weight
XGBoost max_depth (4–10), n_estimators (70–850)

Final Phase: Top 3 models (Balanced Random Forest, XGBoost, Random Forest) retrained with tuned hyperparameters.

Evaluation Design

  • 5-fold cross-validation with per-fold preprocessing.
  • Metrics evaluated at threshold 0.5 and at best-F1 threshold.
  • Event-type breakdown: hospital vs. community exacerbations evaluated separately.
  • Forward validation: 9 months of prospective data (May 2023 – February 2024), assessed with KS test and Wasserstein distance for distribution shift.

Calibration

  • Sigmoid (Platt scaling)
  • Isotonic regression
  • Applied via CalibratedClassifierCV with per-fold calibration.

Evaluation Results

Replace this section with measured results from your training run.

Metric Value Notes
ROC-AUC TBD Cross-validation mean (Β± std)
AUC-PR TBD Primary metric for imbalanced outcome
F1 Score (@ 0.5) TBD Default threshold
Best F1 Score TBD At optimal threshold
Balanced Accuracy TBD Cross-validation mean
Brier Score TBD Probability calibration quality

Caveats on Metrics

  • Performance depends heavily on cohort definition, PRO engagement rates, and label construction.
  • Forward validation results may differ from cross-validation due to temporal shifts in data availability and coding practices.
  • Reported metrics from controlled datasets may not transfer to other settings without recalibration and validation.

Bias, Risks, and Limitations

  • Dataset shift: EHR coding practices, PRO engagement, and population characteristics vary across sites and time periods.
  • PRO engagement bias: Patients who engage more with digital health tools may differ systematically from non-engagers.
  • Label uncertainty: Exacerbation events are constructed via PRO LOGIC β€” different definitions produce different results.
  • Fairness: Outcomes and feature availability may vary by age, sex, deprivation, comorbidity burden, or service access.
  • Misuse risk: Using predictions to drive clinical action without clinical safety processes can cause harm through false positives and negatives.

How to Use

Pipeline Execution Order

# 1. Install dependencies
pip install pandas numpy scikit-learn imbalanced-learn xgboost lightgbm catboost scikit-optimize shap mlflow matplotlib seaborn pyyaml joblib scipy

# 2. Set up labels (choose one)
python training/setup_labels_hosp_comm.py       # hospital + community exacerbations
python training/setup_labels_only_hosp.py        # hospital only
python training/setup_labels_forward_val.py      # forward validation set

# 3. Split data
python training/split_train_test_val.py

# 4. Feature engineering (run in sequence)
python training/process_demographics.py
python training/process_comorbidities.py
python training/process_exacerbation_history.py
python training/process_spirometry.py
python training/process_labs.py
python training/process_pros.py

# 5. Combine features
python training/combine_features.py

# 6. Encode and impute
python training/encode_and_impute.py

# 7. Screen algorithms
python training/cross_val_first_models.py

# 8. Hyperparameter tuning
python training/perform_hyper_param_tuning.py

# 9. Final cross-validation with best models
python training/cross_val_final_models.py

# 10. Forward validation (optional)
python training/perform_forward_validation.py

Configuration

Edit config.yaml to adjust:

  • prediction_window (default: 90 days)
  • lookback_period (default: 180 days)
  • model_type ('hosp_comm' or 'only_hosp')
  • num_folds (default: 5)
  • Input/output data paths

Core Library

model_h.py provides 40+ reusable functions for:

  • PRO LOGIC exacerbation validation
  • Recency-weighted feature engineering
  • Model evaluation (F1, PR-AUC, ROC-AUC, Brier, calibration curves)
  • SHAP explainability (summary, local, interaction, decision plots)
  • Calibration (sigmoid, isotonic, spline)

Environmental Impact

Training computational requirements are minimal β€” all models are traditional tabular ML classifiers running on CPU. A full pipeline run (feature engineering through cross-validation) completes in minutes on a standard laptop.


Citation

If you use this model or code, please cite:

  • This repository: (add citation format / Zenodo DOI if minted)
  • Associated publications: (clinical trial results paper β€” forthcoming)

Authors and Contributors

  • Storm ID (maintainers)

License

This model and code are released under the Apache 2.0 license.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support