Synav's picture
Update README.md
c7ca32b verified
metadata
license: apache-2.0

Explainable Acute Leukemia Mortality Predictor – Model Repository

This repository contains the trained machine learning model artifacts generated by the Explainable Acute Leukemia Mortality Predictor Hugging Face Space.

It serves exclusively as a persistent storage and versioning registry for models developed for:

Mortality risk prediction in patients with acute leukemia using structured clinical data.

This repository does not provide training or an interactive interface.


Relationship to the Application

Model development, validation, and prediction occur in the companion Space:

Synav/Explainable-Acute-Leukemia-Mortality-Predictor

Because Hugging Face Spaces use temporary storage, trained models are automatically:

  1. Saved
  2. Versioned
  3. Uploaded here
  4. Preserved as permanent releases

This ensures:

  • reproducibility
  • auditability
  • long-term persistence
  • external validation capability

Model Description

Each stored model is:

  • Task: Binary mortality prediction (Yes/No)
  • Algorithm: Logistic Regression (scikit-learn)
  • Output: Probability of mortality (0–1)
  • Explainability: SHAP feature attribution

Embedded preprocessing

Numeric variables

  • median imputation
  • standard scaling

Categorical variables

  • most-frequent imputation
  • one-hot encoding

All preprocessing steps are embedded within the pipeline to guarantee:

  • identical inference behavior
  • schema consistency
  • zero manual preprocessing

Files Included per Release

Each version folder contains:

model.joblib

Complete scikit-learn pipeline including preprocessing, feature encoding, and the trained classifier. Ready for immediate inference.

meta.json

Structured metadata including:

  • feature schema
  • variable types
  • evaluation metrics
  • ROC/PR curve data
  • calibration statistics
  • confusion matrix
  • decision curve analysis
  • validation configuration

These artifacts enable full reproducibility and downstream analysis.


Evaluation Metrics Captured

Models are evaluated on held-out test data using clinical-grade performance criteria.

Discrimination

  • ROC AUC
  • ROC curve
  • Precision–Recall curve
  • Average Precision

Classification

  • Sensitivity (Recall)
  • Specificity
  • Precision
  • F1 score
  • Accuracy
  • Balanced accuracy
  • Confusion matrix

Calibration

  • Calibration (reliability) curve
  • Brier score

Clinical Utility

  • Decision Curve Analysis (net benefit)

Repository Structure

releases/
  └── <version>/
      ├── model.joblib
      └── meta.json

latest/
  ├── model.joblib
  └── meta.json

README.md
  • releases// → immutable historical snapshots
  • latest/ → most recent validated model

Intended Use

These artifacts are intended for:

  • Clinical research
  • Risk stratification studies
  • Independent external validation
  • Multi-center reproducibility testing
  • Educational and exploratory analysis

Not Intended For

These models:

  • are not regulatory-approved medical devices
  • do not replace clinician judgment
  • should not be used for autonomous decision-making
  • require local validation prior to clinical deployment

Clinical oversight is mandatory.


Loading a Model

import joblib

model = joblib.load("model.joblib")
proba = model.predict_proba(X)[:, 1]

No additional preprocessing is required.


Author

Dr. Syed Naveed Hematology & Oncology Sheikh Shakhbout Medical City Abu Dhabi, UAE


License

Apache 2.0