Ch3DS
/

SSIBERT-multiclass

Model card Files Files and versions

SSIBERT-multiclass / README.md

Ch3w3y's picture

Upload README.md with huggingface_hub

1246525 verified about 1 month ago

|

history blame contribute delete

3.33 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: emilyalsentzer/Bio_ClinicalBERT
	tags:
	- medical
	- clinical
	- ssi
	- classification
	- surveillance
	- multi-label
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: SSIBERT-multiclass
	results:
	- task:
	type: text-classification
	name: Multi-Label SSI Detection
	dataset:
	name: Synthetic UK NHS Clinical Notes (Multi-Label)
	type: synthetic
	split: test
	metrics:
	- name: F1 (Micro)
	type: f1
	value: 1.0
	---

	# Model Card for Ch3DS/SSIBERT-multiclass

	## Model Details

	### Model Description

	This model is a fine-tuned version of [Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) designed for multi-label classification of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection:

	1. Purulence: Presence of pus or purulent discharge.
	2. Redness: Erythema, spreading redness, or inflammation.
	3. Fever: Pyrexia, rigors, or elevated temperature.
	4. Antibiotics: Prescription of antibiotics (treatment or prophylaxis).
	5. SSI: Overall determination of Surgical Site Infection.

	It is tailored to UK NHS terminology.

	- Developed by: Daryn Sutton
	- Model type: Multi-Label Text Classification (BERT)
	- Language(s) (NLP): English
	- License: Apache 2.0
	- Finetuned from model: [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT)
	- Repository: [https://huggingface.co/Ch3DS/SSIBERT-multiclass](https://huggingface.co/Ch3DS/SSIBERT-multiclass)

	### Uses

	#### Direct Use

	This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance.

	- Input: Clinical note text.
	- Output: Probabilities for `[Purulence, Redness, Fever, Antibiotics, SSI]`.

	#### Out-of-Scope Use

	- Diagnosis: This is a surveillance tool, not a diagnostic device.
	- Non-UK Contexts: May perform poorly on non-NHS terminology.

	## How to Get Started with the Model

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "Ch3DS/SSIBERT-multiclass"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin."
	inputs = tokenizer(text, return_tensors="pt")

	with torch.no_grad():
	logits = model(**inputs).logits
	probs = torch.sigmoid(logits)

	labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"]
	for i, label in enumerate(labels):
	print(f"{label}: {probs[0][i]:.2%}")
	```

	## Training Details

	### Training Data

	- Source: 5 million synthetic clinical notes.
	- Methodology: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions.
	- Labels: Multi-hot encoded.

	### Training Procedure

	- Epochs: 3
	- Batch Size: 64
	- Hardware: NVIDIA GeForce RTX 5070 Ti

	## Evaluation

	Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution.

	## Model Card Contact

	Daryn Sutton