---
title: Explainable-Acute-Leukemia-Mortality-Predictor
emoji: 🧬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
---

# Explainable Acute Leukemia Mortality Predictor

**Explainable Acute Leukemia Mortality Predictor** is an interactive, end-to-end clinical machine-learning platform for building, validating, and deploying **transparent, interpretable mortality prediction models** for patients with **acute leukemia**.

The system integrates:

- Statistical modeling  
- Explainable AI (SHAP)  
- Bootstrap internal validation  
- External clinical validation  

into a single workflow specifically designed for **clinicians and clinical researchers**.

This tool enables rapid development of **clinically trustworthy, publication-grade models** without requiring programming expertise.

---

## ⭐ Quick Start – Single Patient Mortality Prediction

To **predict mortality probability for an individual patient**:

1. Open the **Predict + SHAP (2️⃣ Predict)** tab
2. Enter patient details across:
   - **Core**
   - **Clinical (Yes/No)**
   - **NGS**
   - **FISH**
3. Click **Predict single patient**

The system will automatically generate:
- Predicted mortality probability (0–1)
- Risk band (Low / Intermediate / High)
- SHAP explanation showing which variables contributed most to the prediction
- Downloadable results and plots

This enables **transparent, patient-level, clinically interpretable risk estimation** in seconds.

## Core Capabilities

### Model Development
- Logistic regression–based pipelines (scikit-learn)
- Automatic preprocessing:
  - Numeric → median imputation + scaling
  - Categorical → most-frequent imputation + one-hot encoding
- Schema-aware training directly from Excel
- Optional L1 feature selection
- Optional dimensionality reduction (SVD)

---

### Explainability (Transparent AI)
- SHAP-based local explanations for each patient
- Global feature importance (bar + beeswarm)
- Waterfall plots for single predictions
- Variable-level contribution tracking
- Fully auditable predictions from raw inputs → probability

Designed for **clinical interpretability**, not black-box modeling.

---

## Validation Framework (Clinical-Grade)

Unlike typical ML demos, this framework implements **rigorous statistical validation appropriate for clinical research**.

### Discrimination
- ROC AUC
- ROC curves
- Precision–Recall curves
- Average Precision (PR-AUC)

### Calibration
- Reliability (calibration) curves
- Brier score

### Clinical Utility
- Decision Curve Analysis (net benefit)

### Threshold Metrics
- Sensitivity / specificity
- F1 score
- Balanced accuracy
- Confusion matrix
- Optimal threshold selection

---

## Internal Validation (Bootstrapping)

The platform supports **bootstrap out-of-bag (OOB) internal validation**, which is preferred over simple train/test splits for small clinical datasets.

For each bootstrap iteration:
1. Resample patients with replacement
2. Train on the bootstrap sample
3. Evaluate on out-of-bag patients
4. Aggregate performance

Outputs include:
- Mean metrics
- Median metrics
- 95% confidence intervals
- Per-iteration results (downloadable CSV)

This provides:
- Robust performance estimates  
- Reduced optimism bias  
- Statistically reliable uncertainty bounds  

Suitable for **peer-reviewed publication** and **clinical methodology studies**.

---

## External Validation

Independent cohorts can be evaluated directly:

- Automatic probability generation
- Full metrics computation
- ROC / PR / calibration / decision curves
- Patient-level prediction export

Prediction CSVs can be used to generate **publication-quality NEJM-style figure panels**.

---

## Deployment & Versioning
- One-click publishing to Hugging Face Model Hub
- Timestamped immutable releases
- Automatic `latest/` tracking
- Portable artifacts:
  - `model.joblib`
  - `meta.json` (schema + metrics + bootstrap results)

Models can be reused on any Excel file with identical column names.

---

## Workflow

### Training
1. Upload labeled Excel (`Outcome Event`)
2. Select variable types
3. Train model
4. Review discrimination + calibration metrics
5. Run bootstrap internal validation (recommended)
6. Publish versioned model

### Prediction / Validation
1. Load trained model
2. Upload new Excel
3. Generate probabilities + risk bands
4. Run external validation (if labels present)
5. Export results and figures

---

## Intended Users
- Hematology–Oncology clinicians
- Clinical researchers
- Epidemiologists
- Outcomes researchers
- Medical AI investigators

No coding required.

---

## Intended Use

This platform supports:
- Clinical research
- Prognostic modeling
- Explainable AI development
- Educational and methodological purposes

**Not a medical device. Not for autonomous clinical decision-making.**

Clinical judgment must always prevail.

---

## Design Philosophy

This project prioritizes:

- Interpretability over black-box performance  
- Statistical rigor over optimistic metrics  
- Reproducibility over ad-hoc experimentation  
- Clinical relevance over purely technical novelty  
- Transparency over opacity  

Every prediction must be explainable and defensible.

---

## Technical Stack
- Python
- Streamlit
- scikit-learn
- SHAP
- Matplotlib
- Hugging Face Spaces + Model Hub

---

## Author

Developed and maintained by  
**Dr. Syed Naveed**  
Hematology–Oncology Clinician & Researcher  

Focus areas:
- Explainable AI in hematology
- Clinical machine learning validation
- Translational AI for real-world patient care

---

## License
Apache 2.0

---

For configuration details:  
https://huggingface.co/docs/hub/spaces-config-reference