Model Card for Ch3DS/SSIBERT-multiclass

Model Details

Model Description

This model is a fine-tuned version of Bio_ClinicalBERT designed for multi-label classification of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection:

Purulence: Presence of pus or purulent discharge.
Redness: Erythema, spreading redness, or inflammation.
Fever: Pyrexia, rigors, or elevated temperature.
Antibiotics: Prescription of antibiotics (treatment or prophylaxis).
SSI: Overall determination of Surgical Site Infection.

It is tailored to UK NHS terminology.

Developed by: Daryn Sutton
Model type: Multi-Label Text Classification (BERT)
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: emilyalsentzer/Bio_ClinicalBERT
Repository: https://huggingface.co/Ch3DS/SSIBERT-multiclass

Uses

Direct Use

This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance.

Input: Clinical note text.
Output: Probabilities for [Purulence, Redness, Fever, Antibiotics, SSI].

Out-of-Scope Use

Diagnosis: This is a surveillance tool, not a diagnostic device.
Non-UK Contexts: May perform poorly on non-NHS terminology.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Ch3DS/SSIBERT-multiclass"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.sigmoid(logits)

labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"]
for i, label in enumerate(labels):
    print(f"{label}: {probs[0][i]:.2%}")

Training Details

Training Data

Source: 5 million synthetic clinical notes.
Methodology: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions.
Labels: Multi-hot encoded.

Training Procedure

Epochs: 3
Batch Size: 64
Hardware: NVIDIA GeForce RTX 5070 Ti

Evaluation

Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution.

Model Card Contact

Daryn Sutton

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ch3DS/SSIBERT-multiclass

Base model

emilyalsentzer/Bio_ClinicalBERT

Finetuned

(69)

this model

Evaluation results

F1 (Micro) on Synthetic UK NHS Clinical Notes (Multi-Label)
test set self-reported

1.000