FORUM-TB: Trained Random Forest Models

Trained Random Forest classifiers for M. tuberculosis drug resistance prediction from whole-genome sequencing (WGS) data.

Part of Project FORUM β€” an open-science interpretable ML pipeline for TB AMR prediction, developed in collaboration with Noah The Microbialist.

Related Resources

Models

File Drug Test AUC-ROC CV AUC-ROC Size
rf_RIFAMPICIN_v2.joblib Rifampicin 0.975 0.969 Β± 0.004 61MB
rf_ISONIAZID_v2.joblib Isoniazid 0.948 0.946 Β± 0.008 48MB
rf_ETHAMBUTOL_v2.joblib Ethambutol 0.894 0.900 Β± 0.007 77MB
rf_PYRAZINAMIDE_v2.joblib Pyrazinamide 0.886 0.883 Β± 0.007 58MB

Usage

from huggingface_hub import hf_hub_download
import joblib

# Download and load a model
path = hf_hub_download(
    repo_id="nanzhen102/FORUM-TB-models",
    filename="rf_RIFAMPICIN_v2.joblib"
)
rf = joblib.load(path)

Input Format

Models expect a feature vector of 2,693 AMR gene positions encoded as integers (0=REF, 1=A, 2=T, 3=C, 4=G).

Download the ML-ready dataset directly from Kaggle: https://www.kaggle.com/datasets/nanzhen/forum-tb?resource=download

Biological Validation

Top SHAP features confirmed against known resistance mutations:

  • pos_761155 β†’ rpoB codon 450 β†’ S450L βœ… (Rifampicin)
  • pos_2155168 β†’ katG codon 315 β†’ S315T βœ… (Isoniazid)
  • pos_4247429 β†’ embB codon 306 β†’ M306I/V βœ… (Ethambutol)

Authors

  • Noah LeGall, Ph.D. β€” The Microbialist
  • Nanzhen (Aspen) Qiao β€” Queen's University, Kingston, Canada

License

CC BY-NC 4.0 β€” free for non-commercial use with attribution.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using nanzhen102/FORUM-TB-models 1