File size: 3,610 Bytes

544b211
 
 
c4bef3b
c7ca32b
c4bef3b
c7ca32b
 
 
 
 
 
 
 
c4bef3b
 
 
c7ca32b
 
 
 
 
c4bef3b
c7ca32b
c4bef3b
c7ca32b
 
 
 
 
 
 
 
 
 
 
c4bef3b
 
 
c7ca32b
 
 
 
 
 
 
 
c4bef3b
c7ca32b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4bef3b
 
c7ca32b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4bef3b
 
c7ca32b
 
 
c4bef3b
 
 
c7ca32b
c4bef3b
 
 
 
 
c7ca32b
c4bef3b
c7ca32b
 
 
 
c4bef3b
c7ca32b
 
 
c4bef3b
c7ca32b

---
license: apache-2.0
---

# Explainable Acute Leukemia Mortality Predictor – Model Repository

This repository contains the **trained machine learning model artifacts** generated by the
**Explainable Acute Leukemia Mortality Predictor** Hugging Face Space.

It serves exclusively as a **persistent storage and versioning registry** for models developed for:

**Mortality risk prediction in patients with acute leukemia using structured clinical data.**

This repository does **not** provide training or an interactive interface.

---

## Relationship to the Application

Model development, validation, and prediction occur in the companion Space:

**Synav/Explainable-Acute-Leukemia-Mortality-Predictor**

Because Hugging Face Spaces use temporary storage, trained models are automatically:

1. Saved
2. Versioned
3. Uploaded here
4. Preserved as permanent releases

This ensures:

* reproducibility
* auditability
* long-term persistence
* external validation capability

---

## Model Description

Each stored model is:

* **Task:** Binary mortality prediction (Yes/No)
* **Algorithm:** Logistic Regression (scikit-learn)
* **Output:** Probability of mortality (0–1)
* **Explainability:** SHAP feature attribution

### Embedded preprocessing

Numeric variables

* median imputation
* standard scaling

Categorical variables

* most-frequent imputation
* one-hot encoding

All preprocessing steps are embedded within the pipeline to guarantee:

* identical inference behavior
* schema consistency
* zero manual preprocessing

---

## Files Included per Release

Each version folder contains:

### model.joblib

Complete scikit-learn pipeline including preprocessing, feature encoding, and the trained classifier.
Ready for immediate inference.

### meta.json

Structured metadata including:

* feature schema
* variable types
* evaluation metrics
* ROC/PR curve data
* calibration statistics
* confusion matrix
* decision curve analysis
* validation configuration

These artifacts enable full reproducibility and downstream analysis.

---

## Evaluation Metrics Captured

Models are evaluated on held-out test data using clinical-grade performance criteria.

### Discrimination

* ROC AUC
* ROC curve
* Precision–Recall curve
* Average Precision

### Classification

* Sensitivity (Recall)
* Specificity
* Precision
* F1 score
* Accuracy
* Balanced accuracy
* Confusion matrix

### Calibration

* Calibration (reliability) curve
* Brier score

### Clinical Utility

* Decision Curve Analysis (net benefit)

---

## Repository Structure

```
releases/
  └── <version>/
      ├── model.joblib
      └── meta.json

latest/
  ├── model.joblib
  └── meta.json

README.md
```

* **releases/<version>/** → immutable historical snapshots
* **latest/** → most recent validated model

---

## Intended Use

These artifacts are intended for:

* Clinical research
* Risk stratification studies
* Independent external validation
* Multi-center reproducibility testing
* Educational and exploratory analysis

---

## Not Intended For

These models:

* are not regulatory-approved medical devices
* do not replace clinician judgment
* should not be used for autonomous decision-making
* require local validation prior to clinical deployment

Clinical oversight is mandatory.

---

## Loading a Model

```python
import joblib

model = joblib.load("model.joblib")
proba = model.predict_proba(X)[:, 1]
```

No additional preprocessing is required.

---

## Author

Dr. Syed Naveed
Hematology & Oncology
Sheikh Shakhbout Medical City
Abu Dhabi, UAE

---

## License

Apache 2.0

---