--- language: - en license: apache-2.0 base_model: emilyalsentzer/Bio_ClinicalBERT tags: - medical - clinical - ssi - classification - surveillance - multi-label metrics: - accuracy - f1 - precision - recall model-index: - name: SSIBERT-multiclass results: - task: type: text-classification name: Multi-Label SSI Detection dataset: name: Synthetic UK NHS Clinical Notes (Multi-Label) type: synthetic split: test metrics: - name: F1 (Micro) type: f1 value: 1.0 --- # Model Card for Ch3DS/SSIBERT-multiclass ## Model Details ### Model Description This model is a fine-tuned version of [Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) designed for **multi-label classification** of postoperative clinical notes. Unlike the binary SSI model, this model identifies specific clinical indicators of infection: 1. **Purulence**: Presence of pus or purulent discharge. 2. **Redness**: Erythema, spreading redness, or inflammation. 3. **Fever**: Pyrexia, rigors, or elevated temperature. 4. **Antibiotics**: Prescription of antibiotics (treatment or prophylaxis). 5. **SSI**: Overall determination of Surgical Site Infection. It is tailored to **UK NHS terminology**. - **Developed by:** Daryn Sutton - **Model type:** Multi-Label Text Classification (BERT) - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Finetuned from model:** [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) - **Repository:** [https://huggingface.co/Ch3DS/SSIBERT-multiclass](https://huggingface.co/Ch3DS/SSIBERT-multiclass) ### Uses #### Direct Use This model extracts structured data from unstructured clinical notes, allowing for more granular surveillance. - **Input**: Clinical note text. - **Output**: Probabilities for `[Purulence, Redness, Fever, Antibiotics, SSI]`. #### Out-of-Scope Use - **Diagnosis**: This is a surveillance tool, not a diagnostic device. - **Non-UK Contexts**: May perform poorly on non-NHS terminology. ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "Ch3DS/SSIBERT-multiclass" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) text = "Day 3 post THR. Wound oozing pus. Patient pyrexial. Plan: Start Flucloxacillin." inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits probs = torch.sigmoid(logits) labels = ["Purulence", "Redness", "Fever", "Antibiotics", "SSI"] for i, label in enumerate(labels): print(f"{label}: {probs[0][i]:.2%}") ``` ## Training Details ### Training Data - **Source**: 5 million synthetic clinical notes. - **Methodology**: Generated using templates based on UK NHS terminology and the PRAISE network's surveillance definitions. - **Labels**: Multi-hot encoded. ### Training Procedure - **Epochs**: 3 - **Batch Size**: 64 - **Hardware**: NVIDIA GeForce RTX 5070 Ti ## Evaluation Evaluated on a held-out test set of synthetic data. Achieved near-perfect performance on the synthetic distribution. ## Model Card Contact **Daryn Sutton**