File size: 3,334 Bytes
522f005
1246525
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
522f005
 
1246525
522f005
 
 
 
 
1246525
522f005
1246525
 
 
 
 
522f005
1246525
522f005
1246525
 
 
 
 
 
522f005
1246525
522f005
1246525
522f005
1246525
522f005
1246525
 
522f005
1246525
522f005
1246525
 
522f005
1246525
522f005
1246525
 
 
522f005
1246525
 
 
522f005
1246525
 
522f005
1246525
 
 
522f005
1246525
 
 
 
522f005
 
 
 
 
1246525
 
 
522f005
 
 
1246525
 
 
522f005
 
 
1246525
522f005
 
 
1246525
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
language:
  - en
license: apache-2.0
base_model: emilyalsentzer/Bio_ClinicalBERT
tags:
  - medical
  - clinical
  - ssi
  - classification
  - surveillance
  - multi-label
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: SSIBERT-multiclass
    results:
      - task:
          type: text-classification
          name: Multi-Label SSI Detection
        dataset:
          name: Synthetic UK NHS Clinical Notes (Multi-Label)
          type: synthetic
          split: test
        metrics:
          - name: F1 (Micro)
            type: f1
            value: 1.0
---

# Model Card for Ch3DS/SSIBERT-multiclass

## Model Details

### Model Description

This model is a fine-tuned version of [Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) designed for **multi-label classification** of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection:

1.  **Purulence**: Presence of pus or purulent discharge.
2.  **Redness**: Erythema, spreading redness, or inflammation.
3.  **Fever**: Pyrexia, rigors, or elevated temperature.
4.  **Antibiotics**: Prescription of antibiotics (treatment or prophylaxis).
5.  **SSI**: Overall determination of Surgical Site Infection.

It is tailored to **UK NHS terminology**.

- **Developed by:** Daryn Sutton
- **Model type:** Multi-Label Text Classification (BERT)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT)
- **Repository:** [https://huggingface.co/Ch3DS/SSIBERT-multiclass](https://huggingface.co/Ch3DS/SSIBERT-multiclass)

### Uses

#### Direct Use

This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance.

- **Input**: Clinical note text.
- **Output**: Probabilities for `[Purulence, Redness, Fever, Antibiotics, SSI]`.

#### Out-of-Scope Use

- **Diagnosis**: This is a surveillance tool, not a diagnostic device.
- **Non-UK Contexts**: May perform poorly on non-NHS terminology.

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Ch3DS/SSIBERT-multiclass"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.sigmoid(logits)

labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"]
for i, label in enumerate(labels):
    print(f"{label}: {probs[0][i]:.2%}")
```

## Training Details

### Training Data

- **Source**: 5 million synthetic clinical notes.
- **Methodology**: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions.
- **Labels**: Multi-hot encoded.

### Training Procedure

- **Epochs**: 3
- **Batch Size**: 64
- **Hardware**: NVIDIA GeForce RTX 5070 Ti

## Evaluation

Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution.

## Model Card Contact

**Daryn Sutton**