EspressoPro ADT Cell Type Models

Model Summary

This repository provides pre-trained EspressoPro models for cell type annotation from single-cell surface protein (ADT) data, designed for blood and bone marrow mononuclear cells in protein-only settings (such as Mission Bio Tapestri DNA+ADT workflows).

The pipeline is available at: https://github.com/uom-eoh-lab-published/2026__EspressoPro

The release contains one-vs-rest (OvR) binary classifiers per cell type plus a multiclass calibration layer for three annotation resolutions of increasing biological detail.

Model Details

  • Developed by: Kristian Gurashi
  • Model type: Stacked ensemble OvR classifiers with Platt calibration
    (logistic regression over XGB, NB, KNN, and MLP prediction probabilities)
  • Input: Per-cell ADT feature vectors (CLR-normalised surface protein expression)
  • Output: Per-cell class probabilities and predicted cell type labels

Included Files

The repository is organised by reference atlas (Hao, Triana, Zhang, Luecken) and by label resolution (Broad, Simplified, Detailed).
Each atlas/resolution folder contains (i) the trained models, (ii) evaluation reports, and (iii) figures.

Models (Release/<Atlas>/Models/<Resolution>/)

  • Multiclass_models.joblib
    Main file for inference. Loads everything needed to run predictions for that atlas/resolution:
    • all per-class Platt calibrated OvR “heads”
    • class_names (probability column order)
    • excluded class list (if applicable)
    • multiclass temperature-scaling calibrator

Reports (Release/<Atlas>/Reports/<Resolution>/)

  • metrics/
    CSV exports of evaluation outputs, including:

    • multiclass accuracy metrics (precision/recall/F1/AUC) on the held-out test split
    • multiclass confusion matrix on the held-out test split
    • per-class accuracy metrics (precision/recall/F1/AUC) and confusion matrix on the held-out test split
    • per-class error rate pre and post calibrated on the held-out test split
  • probabilities/
    CSV exports comparing:

    • Multiclass label prediction probabilities on test set

Figures (Release/<Atlas>/Figures/<Resolution>/)

  • multiclass_confusion_matrix_on_test.png
    Multiclass confusion matrix for the held-out test split.

  • multiclass_confusion_matrix_on_test_with_percentage_agreement.png
    Multiclass confusion matrix for the held-out test split with % agreement between true label and predicted.

  • per_class/
    Per-class plots, including:

    • binary confusion matrix pre calibration
    • ROC curve (AUC) pre calibration
    • binary confusion matrix post calibration
    • ROC curve (AUC) post calibration
    • UMAP of the held-out train split
    • UMAP legend
    • calibration evaluation on the held-out test split
    • SHAP beeswarm on the held-out train split

Uses

Direct Use

Leveraged by EspressoPro to annotate cell types from ADT-only single-cell data (blood/bone marrow mononuclear cells), including Mission Bio Tapestri DNA+ADT datasets.

Bias, Risks, and Limitations

  • Reference bias: trained on human healthy donor PBMC/BMMC-derived references; performance may differ in disease or heavily perturbed samples. Not expected to work well in other tissues.
  • Panel dependence: requires feature alignment to the expected ADT columns; missing/mismatched antibodies can reduce accuracy.
  • Class coverage: Only classes which led to effective predictions from at least one of the four atlases were trained for prediction.
  • Interpretation: probabilities are model-derived and should be validated with marker checks and expected biology.

Testing Data, Factors & Metrics

Testing Data

  • TRAIN: used to train one-vs-rest (OvR) classifiers.
  • CAL: used only for probability calibration (Platt per class + multiclass temperature scaling).
  • TEST: used only for evaluation.

Note: CAL and TEST include only the classes learned from TRAIN; excluded or unknown labels are removed.

Factors

  • RAW: OvR probabilities before calibration.
  • PLATT: OvR probabilities after Platt calibration on CAL (skipped if CAL is single-class).
  • CAL: final multiclass probabilities after temperature scaling (fit on CAL, applied to TEST).

Metrics

Multiclass (TEST, using CAL probabilities):

  • Accuracy
  • Precision / Recall / F1
  • Confusion matrix

Per-class (TEST, RAW vs CAL):

  • Confusion matrix (TP, FP, TN, FN)
  • Precision, recall, F1
  • ROC curve and AUC

Calibration (per class, TEST):

  • LogLoss and Brier score before vs after Platt calibration
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support