Spaces:

Synav
/

Explainable-Acute-Leukemia-Mortality-Predictor

Running

App Files Files Community

Explainable-Acute-Leukemia-Mortality-Predictor / README.md

Synav

Update README.md

a735f3b verified about 1 month ago

preview code

raw

history blame contribute delete

5.69 kB

	---
	title: Explainable-Acute-Leukemia-Mortality-Predictor
	emoji: 🧬
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	license: apache-2.0
	---

	# Explainable Acute Leukemia Mortality Predictor

	Explainable Acute Leukemia Mortality Predictor is an interactive, end-to-end clinical machine-learning platform for building, validating, and deploying transparent, interpretable mortality prediction models for patients with acute leukemia.

	The system integrates:

	- Statistical modeling
	- Explainable AI (SHAP)
	- Bootstrap internal validation
	- External clinical validation

	into a single workflow specifically designed for clinicians and clinical researchers.

	This tool enables rapid development of clinically trustworthy, publication-grade models without requiring programming expertise.

	---

	## ⭐ Quick Start – Single Patient Mortality Prediction

	To predict mortality probability for an individual patient:

	1. Open the Predict + SHAP (2️⃣ Predict) tab
	2. Enter patient details across:
	- Core
	- Clinical (Yes/No)
	- NGS
	- FISH
	3. Click Predict single patient

	The system will automatically generate:
	- Predicted mortality probability (0–1)
	- Risk band (Low / Intermediate / High)
	- SHAP explanation showing which variables contributed most to the prediction
	- Downloadable results and plots

	This enables transparent, patient-level, clinically interpretable risk estimation in seconds.

	## Core Capabilities

	### Model Development
	- Logistic regression–based pipelines (scikit-learn)
	- Automatic preprocessing:
	- Numeric → median imputation + scaling
	- Categorical → most-frequent imputation + one-hot encoding
	- Schema-aware training directly from Excel
	- Optional L1 feature selection
	- Optional dimensionality reduction (SVD)

	---

	### Explainability (Transparent AI)
	- SHAP-based local explanations for each patient
	- Global feature importance (bar + beeswarm)
	- Waterfall plots for single predictions
	- Variable-level contribution tracking
	- Fully auditable predictions from raw inputs → probability

	Designed for clinical interpretability, not black-box modeling.

	---

	## Validation Framework (Clinical-Grade)

	Unlike typical ML demos, this framework implements rigorous statistical validation appropriate for clinical research.

	### Discrimination
	- ROC AUC
	- ROC curves
	- Precision–Recall curves
	- Average Precision (PR-AUC)

	### Calibration
	- Reliability (calibration) curves
	- Brier score

	### Clinical Utility
	- Decision Curve Analysis (net benefit)

	### Threshold Metrics
	- Sensitivity / specificity
	- F1 score
	- Balanced accuracy
	- Confusion matrix
	- Optimal threshold selection

	---

	## Internal Validation (Bootstrapping)

	The platform supports bootstrap out-of-bag (OOB) internal validation, which is preferred over simple train/test splits for small clinical datasets.

	For each bootstrap iteration:
	1. Resample patients with replacement
	2. Train on the bootstrap sample
	3. Evaluate on out-of-bag patients
	4. Aggregate performance

	Outputs include:
	- Mean metrics
	- Median metrics
	- 95% confidence intervals
	- Per-iteration results (downloadable CSV)

	This provides:
	- Robust performance estimates
	- Reduced optimism bias
	- Statistically reliable uncertainty bounds

	Suitable for peer-reviewed publication and clinical methodology studies.

	---

	## External Validation

	Independent cohorts can be evaluated directly:

	- Automatic probability generation
	- Full metrics computation
	- ROC / PR / calibration / decision curves
	- Patient-level prediction export

	Prediction CSVs can be used to generate publication-quality NEJM-style figure panels.

	---

	## Deployment & Versioning
	- One-click publishing to Hugging Face Model Hub
	- Timestamped immutable releases
	- Automatic `latest/` tracking
	- Portable artifacts:
	- `model.joblib`
	- `meta.json` (schema + metrics + bootstrap results)

	Models can be reused on any Excel file with identical column names.

	---

	## Workflow

	### Training
	1. Upload labeled Excel (`Outcome Event`)
	2. Select variable types
	3. Train model
	4. Review discrimination + calibration metrics
	5. Run bootstrap internal validation (recommended)
	6. Publish versioned model

	### Prediction / Validation
	1. Load trained model
	2. Upload new Excel
	3. Generate probabilities + risk bands
	4. Run external validation (if labels present)
	5. Export results and figures

	---

	## Intended Users
	- Hematology–Oncology clinicians
	- Clinical researchers
	- Epidemiologists
	- Outcomes researchers
	- Medical AI investigators

	No coding required.

	---

	## Intended Use

	This platform supports:
	- Clinical research
	- Prognostic modeling
	- Explainable AI development
	- Educational and methodological purposes

	Not a medical device. Not for autonomous clinical decision-making.

	Clinical judgment must always prevail.

	---

	## Design Philosophy

	This project prioritizes:

	- Interpretability over black-box performance
	- Statistical rigor over optimistic metrics
	- Reproducibility over ad-hoc experimentation
	- Clinical relevance over purely technical novelty
	- Transparency over opacity

	Every prediction must be explainable and defensible.

	---

	## Technical Stack
	- Python
	- Streamlit
	- scikit-learn
	- SHAP
	- Matplotlib
	- Hugging Face Spaces + Model Hub

	---

	## Author

	Developed and maintained by
	Dr. Syed Naveed
	Hematology–Oncology Clinician & Researcher

	Focus areas:
	- Explainable AI in hematology
	- Clinical machine learning validation
	- Translational AI for real-world patient care

	---

	## License
	Apache 2.0

	---

	For configuration details:
	https://huggingface.co/docs/hub/spaces-config-reference