Model Card for Ch3DS/SSIBERT-multiclass
Model Details
Model Description
This model is a fine-tuned version of Bio_ClinicalBERT designed for multi-label classification of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection:
- Purulence: Presence of pus or purulent discharge.
- Redness: Erythema, spreading redness, or inflammation.
- Fever: Pyrexia, rigors, or elevated temperature.
- Antibiotics: Prescription of antibiotics (treatment or prophylaxis).
- SSI: Overall determination of Surgical Site Infection.
It is tailored to UK NHS terminology.
- Developed by: Daryn Sutton
- Model type: Multi-Label Text Classification (BERT)
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: emilyalsentzer/Bio_ClinicalBERT
- Repository: https://huggingface.co/Ch3DS/SSIBERT-multiclass
Uses
Direct Use
This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance.
- Input: Clinical note text.
- Output: Probabilities for
[Purulence, Redness, Fever, Antibiotics, SSI].
Out-of-Scope Use
- Diagnosis: This is a surveillance tool, not a diagnostic device.
- Non-UK Contexts: May perform poorly on non-NHS terminology.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "Ch3DS/SSIBERT-multiclass"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.sigmoid(logits)
labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"]
for i, label in enumerate(labels):
print(f"{label}: {probs[0][i]:.2%}")
Training Details
Training Data
- Source: 5 million synthetic clinical notes.
- Methodology: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions.
- Labels: Multi-hot encoded.
Training Procedure
- Epochs: 3
- Batch Size: 64
- Hardware: NVIDIA GeForce RTX 5070 Ti
Evaluation
Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution.
Model Card Contact
Daryn Sutton
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Ch3DS/SSIBERT-multiclass
Base model
emilyalsentzer/Bio_ClinicalBERTEvaluation results
- F1 (Micro) on Synthetic UK NHS Clinical Notes (Multi-Label)test set self-reported1.000