NCAS Hospital Indication Classifier

A BioClinicalBERT-based multilabel classifier for categorising antimicrobial prescription indication text from hospital electronic medical records (EMR). Developed as part of a research project at RMIT University / The Royal Melbourne Hospital (RMH) investigating automated antimicrobial stewardship support.

Model description

Attribute Value
Base encoder emilyalsentzer/Bio_ClinicalBERT
Pooling Mean pooling over token embeddings
Classification head Linear + Sigmoid
Task Multilabel classification (8 categories)
Training data ~2,000 manually annotated hospital prescription records (RMH 2021)
Held-out evaluation 600 records from RMH 2022, 2023, 2024

Label schema (8catb)

Label Description
respiratory - ioi Respiratory infection of indication
skin and soft tissue - ioi Skin/soft-tissue infection of indication
urinary tract - ioi Urinary tract infection of indication
other Other or unspecified indication
sepsis Sepsis or bacteraemia
undifferentiated infection Infection without identified source
organism only Organism identified but no clinical syndrome specified
no indication documented No clinical indication present in the text

A sample can receive one or more labels simultaneously (multilabel).

Post-processing rule

After model prediction, sepsis is suppressed from any sample that also receives respiratory - ioi OR skin and soft tissue - ioi. If suppression would leave zero labels, the removal is reverted (fallback guarantee).

Usage

Quick start

from huggingface_hub import hf_hub_download
from ncas_indication.model import ClinicalBERTClassifier
from transformers import AutoTokenizer

# Download checkpoint
model_path = hf_hub_download(
    repo_id="jibmaird/NCAS-hospital-indication-classifier",
    filename="indication_classifier_model.pt",
)

# Load model (label names and thresholds are embedded in the checkpoint)
model, label_columns, thresholds = ClinicalBERTClassifier.from_checkpoint(model_path)
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

Or using the inference script from the GitHub repository:

# Single text
python inference/predict.py --text "UTI prophylaxis post-renal transplant"

# CSV file
python inference/predict.py --input your_file.csv --output predictions.csv

Desktop application

A cross-platform desktop GUI is available in the app/ folder of the repository. See app/README.md.

Training

Hyperparameters

Parameter Value
Learning rate 1e-5
Batch size 8
Epochs 20
Optimizer AdamW
Loss function Weighted BCE (inverse-frequency weights)
Validation split 20% of training data
Threshold selection Per-label F1 maximisation on validation set

Training procedure

  1. The combined dataset of ~2,000 labelled records was split 80/20 for training and validation.
  2. Inverse-frequency class weights were applied to the BCE loss to address label imbalance.
  3. Per-label decision thresholds were optimised on the validation set by grid search over [0.1, 0.2, …, 0.8] to maximise label-specific F1.
  4. The model with the best weighted-macro F1 across epochs was retained.

Checkpoint format

The .pt file is a standard PyTorch checkpoint dict with keys:

{
    "model_state_dict":   ...,   # nn.Module weights
    "label_columns":      [...], # ordered label names
    "optimal_thresholds": [...], # per-label decision thresholds
    "n_labels":           8,
    "base_model":         "emilyalsentzer/Bio_ClinicalBERT",
}

Limitations and intended use

  • The model was trained and evaluated on de-identified records from a single Australian tertiary hospital (RMH). Performance may differ on records from other hospitals, health systems, or clinical workflows.
  • This model is intended for research purposes and is not a validated clinical decision support tool. Clinical decisions must remain with qualified healthcare professionals.
  • The training data cannot be shared due to privacy restrictions; the annotation schema and data format are documented in the companion GitHub repository.

Citation

If you use this model in your research, please cite:

@article{ncas_indication_classifier_2025,
  title   = {Automated Classification of Antimicrobial Prescription Indications
             Using BioClinicalBERT},
  author  = {...},
  journal = {...},
  year    = {2025},
  note    = {Under review}
}

Repository

Source code, training scripts, and the desktop application are available at:
https://github.com/jibmaird/NCAS-hospital-indication-classifier

License

Apache 2.0 — see LICENSE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jibmaird/NCAS-hospital-indication-classifier

Finetuned
(65)
this model