Synav's picture
Update README.md
a735f3b verified
---
title: Explainable-Acute-Leukemia-Mortality-Predictor
emoji: 🧬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: apache-2.0
---
# Explainable Acute Leukemia Mortality Predictor
**Explainable Acute Leukemia Mortality Predictor** is an interactive, end-to-end clinical machine-learning platform for building, validating, and deploying **transparent, interpretable mortality prediction models** for patients with **acute leukemia**.
The system integrates:
- Statistical modeling
- Explainable AI (SHAP)
- Bootstrap internal validation
- External clinical validation
into a single workflow specifically designed for **clinicians and clinical researchers**.
This tool enables rapid development of **clinically trustworthy, publication-grade models** without requiring programming expertise.
---
## ⭐ Quick Start – Single Patient Mortality Prediction
To **predict mortality probability for an individual patient**:
1. Open the **Predict + SHAP (2️⃣ Predict)** tab
2. Enter patient details across:
- **Core**
- **Clinical (Yes/No)**
- **NGS**
- **FISH**
3. Click **Predict single patient**
The system will automatically generate:
- Predicted mortality probability (0–1)
- Risk band (Low / Intermediate / High)
- SHAP explanation showing which variables contributed most to the prediction
- Downloadable results and plots
This enables **transparent, patient-level, clinically interpretable risk estimation** in seconds.
## Core Capabilities
### Model Development
- Logistic regression–based pipelines (scikit-learn)
- Automatic preprocessing:
- Numeric → median imputation + scaling
- Categorical → most-frequent imputation + one-hot encoding
- Schema-aware training directly from Excel
- Optional L1 feature selection
- Optional dimensionality reduction (SVD)
---
### Explainability (Transparent AI)
- SHAP-based local explanations for each patient
- Global feature importance (bar + beeswarm)
- Waterfall plots for single predictions
- Variable-level contribution tracking
- Fully auditable predictions from raw inputs → probability
Designed for **clinical interpretability**, not black-box modeling.
---
## Validation Framework (Clinical-Grade)
Unlike typical ML demos, this framework implements **rigorous statistical validation appropriate for clinical research**.
### Discrimination
- ROC AUC
- ROC curves
- Precision–Recall curves
- Average Precision (PR-AUC)
### Calibration
- Reliability (calibration) curves
- Brier score
### Clinical Utility
- Decision Curve Analysis (net benefit)
### Threshold Metrics
- Sensitivity / specificity
- F1 score
- Balanced accuracy
- Confusion matrix
- Optimal threshold selection
---
## Internal Validation (Bootstrapping)
The platform supports **bootstrap out-of-bag (OOB) internal validation**, which is preferred over simple train/test splits for small clinical datasets.
For each bootstrap iteration:
1. Resample patients with replacement
2. Train on the bootstrap sample
3. Evaluate on out-of-bag patients
4. Aggregate performance
Outputs include:
- Mean metrics
- Median metrics
- 95% confidence intervals
- Per-iteration results (downloadable CSV)
This provides:
- Robust performance estimates
- Reduced optimism bias
- Statistically reliable uncertainty bounds
Suitable for **peer-reviewed publication** and **clinical methodology studies**.
---
## External Validation
Independent cohorts can be evaluated directly:
- Automatic probability generation
- Full metrics computation
- ROC / PR / calibration / decision curves
- Patient-level prediction export
Prediction CSVs can be used to generate **publication-quality NEJM-style figure panels**.
---
## Deployment & Versioning
- One-click publishing to Hugging Face Model Hub
- Timestamped immutable releases
- Automatic `latest/` tracking
- Portable artifacts:
- `model.joblib`
- `meta.json` (schema + metrics + bootstrap results)
Models can be reused on any Excel file with identical column names.
---
## Workflow
### Training
1. Upload labeled Excel (`Outcome Event`)
2. Select variable types
3. Train model
4. Review discrimination + calibration metrics
5. Run bootstrap internal validation (recommended)
6. Publish versioned model
### Prediction / Validation
1. Load trained model
2. Upload new Excel
3. Generate probabilities + risk bands
4. Run external validation (if labels present)
5. Export results and figures
---
## Intended Users
- Hematology–Oncology clinicians
- Clinical researchers
- Epidemiologists
- Outcomes researchers
- Medical AI investigators
No coding required.
---
## Intended Use
This platform supports:
- Clinical research
- Prognostic modeling
- Explainable AI development
- Educational and methodological purposes
**Not a medical device. Not for autonomous clinical decision-making.**
Clinical judgment must always prevail.
---
## Design Philosophy
This project prioritizes:
- Interpretability over black-box performance
- Statistical rigor over optimistic metrics
- Reproducibility over ad-hoc experimentation
- Clinical relevance over purely technical novelty
- Transparency over opacity
Every prediction must be explainable and defensible.
---
## Technical Stack
- Python
- Streamlit
- scikit-learn
- SHAP
- Matplotlib
- Hugging Face Spaces + Model Hub
---
## Author
Developed and maintained by
**Dr. Syed Naveed**
Hematology–Oncology Clinician & Researcher
Focus areas:
- Explainable AI in hematology
- Clinical machine learning validation
- Translational AI for real-world patient care
---
## License
Apache 2.0
---
For configuration details:
https://huggingface.co/docs/hub/spaces-config-reference