SSIBERT-multiclass / README.md
Ch3w3y's picture
Upload README.md with huggingface_hub
1246525 verified
---
language:
- en
license: apache-2.0
base_model: emilyalsentzer/Bio_ClinicalBERT
tags:
- medical
- clinical
- ssi
- classification
- surveillance
- multi-label
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: SSIBERT-multiclass
results:
- task:
type: text-classification
name: Multi-Label SSI Detection
dataset:
name: Synthetic UK NHS Clinical Notes (Multi-Label)
type: synthetic
split: test
metrics:
- name: F1 (Micro)
type: f1
value: 1.0
---
# Model Card for Ch3DS/SSIBERT-multiclass
## Model Details
### Model Description
This model is a fine-tuned version of [Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) designed for **multi-label classification** of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection:
1. **Purulence**: Presence of pus or purulent discharge.
2. **Redness**: Erythema, spreading redness, or inflammation.
3. **Fever**: Pyrexia, rigors, or elevated temperature.
4. **Antibiotics**: Prescription of antibiotics (treatment or prophylaxis).
5. **SSI**: Overall determination of Surgical Site Infection.
It is tailored to **UK NHS terminology**.
- **Developed by:** Daryn Sutton
- **Model type:** Multi-Label Text Classification (BERT)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT)
- **Repository:** [https://huggingface.co/Ch3DS/SSIBERT-multiclass](https://huggingface.co/Ch3DS/SSIBERT-multiclass)
### Uses
#### Direct Use
This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance.
- **Input**: Clinical note text.
- **Output**: Probabilities for `[Purulence, Redness, Fever, Antibiotics, SSI]`.
#### Out-of-Scope Use
- **Diagnosis**: This is a surveillance tool, not a diagnostic device.
- **Non-UK Contexts**: May perform poorly on non-NHS terminology.
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "Ch3DS/SSIBERT-multiclass"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.sigmoid(logits)
labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"]
for i, label in enumerate(labels):
print(f"{label}: {probs[0][i]:.2%}")
```
## Training Details
### Training Data
- **Source**: 5 million synthetic clinical notes.
- **Methodology**: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions.
- **Labels**: Multi-hot encoded.
### Training Procedure
- **Epochs**: 3
- **Batch Size**: 64
- **Hardware**: NVIDIA GeForce RTX 5070 Ti
## Evaluation
Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution.
## Model Card Contact
**Daryn Sutton**