|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
base_model: emilyalsentzer/Bio_ClinicalBERT |
|
|
tags: |
|
|
- medical |
|
|
- clinical |
|
|
- ssi |
|
|
- classification |
|
|
- surveillance |
|
|
- multi-label |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
model-index: |
|
|
- name: SSIBERT-multiclass |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Multi-Label SSI Detection |
|
|
dataset: |
|
|
name: Synthetic UK NHS Clinical Notes (Multi-Label) |
|
|
type: synthetic |
|
|
split: test |
|
|
metrics: |
|
|
- name: F1 (Micro) |
|
|
type: f1 |
|
|
value: 1.0 |
|
|
--- |
|
|
|
|
|
# Model Card for Ch3DS/SSIBERT-multiclass |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model is a fine-tuned version of [Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) designed for **multi-label classification** of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection: |
|
|
|
|
|
1. **Purulence**: Presence of pus or purulent discharge. |
|
|
2. **Redness**: Erythema, spreading redness, or inflammation. |
|
|
3. **Fever**: Pyrexia, rigors, or elevated temperature. |
|
|
4. **Antibiotics**: Prescription of antibiotics (treatment or prophylaxis). |
|
|
5. **SSI**: Overall determination of Surgical Site Infection. |
|
|
|
|
|
It is tailored to **UK NHS terminology**. |
|
|
|
|
|
- **Developed by:** Daryn Sutton |
|
|
- **Model type:** Multi-Label Text Classification (BERT) |
|
|
- **Language(s) (NLP):** English |
|
|
- **License:** Apache 2.0 |
|
|
- **Finetuned from model:** [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) |
|
|
- **Repository:** [https://huggingface.co/Ch3DS/SSIBERT-multiclass](https://huggingface.co/Ch3DS/SSIBERT-multiclass) |
|
|
|
|
|
### Uses |
|
|
|
|
|
#### Direct Use |
|
|
|
|
|
This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance. |
|
|
|
|
|
- **Input**: Clinical note text. |
|
|
- **Output**: Probabilities for `[Purulence, Redness, Fever, Antibiotics, SSI]`. |
|
|
|
|
|
#### Out-of-Scope Use |
|
|
|
|
|
- **Diagnosis**: This is a surveillance tool, not a diagnostic device. |
|
|
- **Non-UK Contexts**: May perform poorly on non-NHS terminology. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model_name = "Ch3DS/SSIBERT-multiclass" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin." |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
probs = torch.sigmoid(logits) |
|
|
|
|
|
labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"] |
|
|
for i, label in enumerate(labels): |
|
|
print(f"{label}: {probs[0][i]:.2%}") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- **Source**: 5 million synthetic clinical notes. |
|
|
- **Methodology**: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions. |
|
|
- **Labels**: Multi-hot encoded. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- **Epochs**: 3 |
|
|
- **Batch Size**: 64 |
|
|
- **Hardware**: NVIDIA GeForce RTX 5070 Ti |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution. |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
**Daryn Sutton** |
|
|
|