File size: 3,334 Bytes
522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 522f005 1246525 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
language:
- en
license: apache-2.0
base_model: emilyalsentzer/Bio_ClinicalBERT
tags:
- medical
- clinical
- ssi
- classification
- surveillance
- multi-label
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: SSIBERT-multiclass
results:
- task:
type: text-classification
name: Multi-Label SSI Detection
dataset:
name: Synthetic UK NHS Clinical Notes (Multi-Label)
type: synthetic
split: test
metrics:
- name: F1 (Micro)
type: f1
value: 1.0
---
# Model Card for Ch3DS/SSIBERT-multiclass
## Model Details
### Model Description
This model is a fine-tuned version of [Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) designed for **multi-label classification** of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection:
1. **Purulence**: Presence of pus or purulent discharge.
2. **Redness**: Erythema, spreading redness, or inflammation.
3. **Fever**: Pyrexia, rigors, or elevated temperature.
4. **Antibiotics**: Prescription of antibiotics (treatment or prophylaxis).
5. **SSI**: Overall determination of Surgical Site Infection.
It is tailored to **UK NHS terminology**.
- **Developed by:** Daryn Sutton
- **Model type:** Multi-Label Text Classification (BERT)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT)
- **Repository:** [https://huggingface.co/Ch3DS/SSIBERT-multiclass](https://huggingface.co/Ch3DS/SSIBERT-multiclass)
### Uses
#### Direct Use
This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance.
- **Input**: Clinical note text.
- **Output**: Probabilities for `[Purulence, Redness, Fever, Antibiotics, SSI]`.
#### Out-of-Scope Use
- **Diagnosis**: This is a surveillance tool, not a diagnostic device.
- **Non-UK Contexts**: May perform poorly on non-NHS terminology.
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "Ch3DS/SSIBERT-multiclass"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.sigmoid(logits)
labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"]
for i, label in enumerate(labels):
print(f"{label}: {probs[0][i]:.2%}")
```
## Training Details
### Training Data
- **Source**: 5 million synthetic clinical notes.
- **Methodology**: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions.
- **Labels**: Multi-hot encoded.
### Training Procedure
- **Epochs**: 3
- **Batch Size**: 64
- **Hardware**: NVIDIA GeForce RTX 5070 Ti
## Evaluation
Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution.
## Model Card Contact
**Daryn Sutton**
|