--- title: Explainable-Acute-Leukemia-Mortality-Predictor emoji: 🧬 colorFrom: blue colorTo: green sdk: docker pinned: false license: apache-2.0 --- # Explainable Acute Leukemia Mortality Predictor **Explainable Acute Leukemia Mortality Predictor** is an interactive, end-to-end clinical machine-learning platform for building, validating, and deploying **transparent, interpretable mortality prediction models** for patients with **acute leukemia**. The system integrates: - Statistical modeling - Explainable AI (SHAP) - Bootstrap internal validation - External clinical validation into a single workflow specifically designed for **clinicians and clinical researchers**. This tool enables rapid development of **clinically trustworthy, publication-grade models** without requiring programming expertise. --- ## ⭐ Quick Start – Single Patient Mortality Prediction To **predict mortality probability for an individual patient**: 1. Open the **Predict + SHAP (2️⃣ Predict)** tab 2. Enter patient details across: - **Core** - **Clinical (Yes/No)** - **NGS** - **FISH** 3. Click **Predict single patient** The system will automatically generate: - Predicted mortality probability (0–1) - Risk band (Low / Intermediate / High) - SHAP explanation showing which variables contributed most to the prediction - Downloadable results and plots This enables **transparent, patient-level, clinically interpretable risk estimation** in seconds. ## Core Capabilities ### Model Development - Logistic regression–based pipelines (scikit-learn) - Automatic preprocessing: - Numeric → median imputation + scaling - Categorical → most-frequent imputation + one-hot encoding - Schema-aware training directly from Excel - Optional L1 feature selection - Optional dimensionality reduction (SVD) --- ### Explainability (Transparent AI) - SHAP-based local explanations for each patient - Global feature importance (bar + beeswarm) - Waterfall plots for single predictions - Variable-level contribution tracking - Fully auditable predictions from raw inputs → probability Designed for **clinical interpretability**, not black-box modeling. --- ## Validation Framework (Clinical-Grade) Unlike typical ML demos, this framework implements **rigorous statistical validation appropriate for clinical research**. ### Discrimination - ROC AUC - ROC curves - Precision–Recall curves - Average Precision (PR-AUC) ### Calibration - Reliability (calibration) curves - Brier score ### Clinical Utility - Decision Curve Analysis (net benefit) ### Threshold Metrics - Sensitivity / specificity - F1 score - Balanced accuracy - Confusion matrix - Optimal threshold selection --- ## Internal Validation (Bootstrapping) The platform supports **bootstrap out-of-bag (OOB) internal validation**, which is preferred over simple train/test splits for small clinical datasets. For each bootstrap iteration: 1. Resample patients with replacement 2. Train on the bootstrap sample 3. Evaluate on out-of-bag patients 4. Aggregate performance Outputs include: - Mean metrics - Median metrics - 95% confidence intervals - Per-iteration results (downloadable CSV) This provides: - Robust performance estimates - Reduced optimism bias - Statistically reliable uncertainty bounds Suitable for **peer-reviewed publication** and **clinical methodology studies**. --- ## External Validation Independent cohorts can be evaluated directly: - Automatic probability generation - Full metrics computation - ROC / PR / calibration / decision curves - Patient-level prediction export Prediction CSVs can be used to generate **publication-quality NEJM-style figure panels**. --- ## Deployment & Versioning - One-click publishing to Hugging Face Model Hub - Timestamped immutable releases - Automatic `latest/` tracking - Portable artifacts: - `model.joblib` - `meta.json` (schema + metrics + bootstrap results) Models can be reused on any Excel file with identical column names. --- ## Workflow ### Training 1. Upload labeled Excel (`Outcome Event`) 2. Select variable types 3. Train model 4. Review discrimination + calibration metrics 5. Run bootstrap internal validation (recommended) 6. Publish versioned model ### Prediction / Validation 1. Load trained model 2. Upload new Excel 3. Generate probabilities + risk bands 4. Run external validation (if labels present) 5. Export results and figures --- ## Intended Users - Hematology–Oncology clinicians - Clinical researchers - Epidemiologists - Outcomes researchers - Medical AI investigators No coding required. --- ## Intended Use This platform supports: - Clinical research - Prognostic modeling - Explainable AI development - Educational and methodological purposes **Not a medical device. Not for autonomous clinical decision-making.** Clinical judgment must always prevail. --- ## Design Philosophy This project prioritizes: - Interpretability over black-box performance - Statistical rigor over optimistic metrics - Reproducibility over ad-hoc experimentation - Clinical relevance over purely technical novelty - Transparency over opacity Every prediction must be explainable and defensible. --- ## Technical Stack - Python - Streamlit - scikit-learn - SHAP - Matplotlib - Hugging Face Spaces + Model Hub --- ## Author Developed and maintained by **Dr. Syed Naveed** Hematology–Oncology Clinician & Researcher Focus areas: - Explainable AI in hematology - Clinical machine learning validation - Translational AI for real-world patient care --- ## License Apache 2.0 --- For configuration details: https://huggingface.co/docs/hub/spaces-config-reference