NCAS Hospital Indication Classifier

A BioClinicalBERT-based multilabel classifier for categorising antimicrobial prescription indication text from hospital electronic medical records (EMR). Developed as part of a research project at RMIT University / The Royal Melbourne Hospital (RMH) investigating automated antimicrobial stewardship support.

Model description

Attribute	Value
Base encoder	emilyalsentzer/Bio_ClinicalBERT
Pooling	Mean pooling over token embeddings
Classification head	Linear + Sigmoid
Task	Multilabel classification (8 categories)
Training data	~2,000 manually annotated hospital prescription records (RMH 2021)
Held-out evaluation	600 records from RMH 2022, 2023, 2024

Label schema (8catb)

Label	Description
`respiratory - ioi`	Respiratory infection of indication
`skin and soft tissue - ioi`	Skin/soft-tissue infection of indication
`urinary tract - ioi`	Urinary tract infection of indication
`other`	Other or unspecified indication
`sepsis`	Sepsis or bacteraemia
`undifferentiated infection`	Infection without identified source
`organism only`	Organism identified but no clinical syndrome specified
`no indication documented`	No clinical indication present in the text

A sample can receive one or more labels simultaneously (multilabel).

Post-processing rule

After model prediction, sepsis is suppressed from any sample that also receives respiratory - ioi OR skin and soft tissue - ioi. If suppression would leave zero labels, the removal is reverted (fallback guarantee).

Usage

Quick start

from huggingface_hub import hf_hub_download
from ncas_indication.model import ClinicalBERTClassifier
from transformers import AutoTokenizer

# Download checkpoint
model_path = hf_hub_download(
    repo_id="jibmaird/NCAS-hospital-indication-classifier",
    filename="indication_classifier_model.pt",
)

# Load model (label names and thresholds are embedded in the checkpoint)
model, label_columns, thresholds = ClinicalBERTClassifier.from_checkpoint(model_path)
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")

Or using the inference script from the GitHub repository:

# Single text
python inference/predict.py --text "UTI prophylaxis post-renal transplant"

# CSV file
python inference/predict.py --input your_file.csv --output predictions.csv

Desktop application

A cross-platform desktop GUI is available in the app/ folder of the repository. See app/README.md.

Training

Hyperparameters

Parameter	Value
Learning rate	1e-5
Batch size	8
Epochs	20
Optimizer	AdamW
Loss function	Weighted BCE (inverse-frequency weights)
Validation split	20% of training data
Threshold selection	Per-label F1 maximisation on validation set

Training procedure

The combined dataset of ~2,000 labelled records was split 80/20 for training and validation.
Inverse-frequency class weights were applied to the BCE loss to address label imbalance.
Per-label decision thresholds were optimised on the validation set by grid search over [0.1, 0.2, …, 0.8] to maximise label-specific F1.
The model with the best weighted-macro F1 across epochs was retained.

Checkpoint format

The .pt file is a standard PyTorch checkpoint dict with keys:

{
    "model_state_dict":   ...,   # nn.Module weights
    "label_columns":      [...], # ordered label names
    "optimal_thresholds": [...], # per-label decision thresholds
    "n_labels":           8,
    "base_model":         "emilyalsentzer/Bio_ClinicalBERT",
}

Limitations and intended use

The model was trained and evaluated on de-identified records from a single Australian tertiary hospital (RMH). Performance may differ on records from other hospitals, health systems, or clinical workflows.
This model is intended for research purposes and is not a validated clinical decision support tool. Clinical decisions must remain with qualified healthcare professionals.
The training data cannot be shared due to privacy restrictions; the annotation schema and data format are documented in the companion GitHub repository.

Citation

If you use this model in your research, please cite:

@article{ncas_indication_classifier_2025,
  title   = {Automated Classification of Antimicrobial Prescription Indications
             Using BioClinicalBERT},
  author  = {...},
  journal = {...},
  year    = {2025},
  note    = {Under review}
}

Repository

Source code, training scripts, and the desktop application are available at:
https://github.com/jibmaird/NCAS-hospital-indication-classifier

License

Apache 2.0 — see LICENSE.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for jibmaird/NCAS-hospital-indication-classifier

Base model

emilyalsentzer/Bio_ClinicalBERT

Finetuned

(69)

this model