|
|
--- |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- embedding |
|
|
- nursing |
|
|
- clinical-nlp |
|
|
- healthcare |
|
|
- NHS |
|
|
- medical |
|
|
- triage |
|
|
- NEWS2 |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
library_name: sentence-transformers |
|
|
pipeline_tag: sentence-similarity |
|
|
base_model: unsloth/embeddinggemma-300m |
|
|
--- |
|
|
|
|
|
# π₯ NurseEmbed-300M |
|
|
|
|
|
A clinical embedding model fine-tuned for **NHS nursing terminology** and **medical Q&A retrieval**. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
NurseEmbed-300M is based on **EmbeddingGemma-300M** and trained using a **two-stage hybrid approach**: |
|
|
|
|
|
| Stage | Dataset | Samples | Focus | |
|
|
|-------|---------|---------|-------| |
|
|
| **Stage 1** | `tomaarsen/miriad-4.4M-split` | 10,000 | Medical Q&A from peer-reviewed biomedical literature | |
|
|
| **Stage 2** | Custom NHS Dataset | 200 | Nursing shorthand, NEWS2 scores, clinical abbreviations | |
|
|
|
|
|
## π Evaluation Results |
|
|
|
|
|
### Medical Domain (Information Retrieval) |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| **Accuracy@1** | **81.3%** | |
|
|
| Accuracy@10 | **95.4%** | |
|
|
|
|
|
### Real-World Nursing Shorthand Matching |
|
|
|
|
|
| Nursing Shorthand | Matched Definition | Similarity | |
|
|
|-------------------|-------------------|------------| |
|
|
| `Pt c/o SOB` | Patient reporting Shortness of Breath / Dyspnoea | **0.460** β
| |
|
|
| `NEWS2 score is 7` | Urgent response team review required | **0.242** β
| |
|
|
| `Given Paracetamol 1g PO` | Medication administration: Analgesic / Antipyretic | **0.224** β
| |
|
|
| `Plan: Refer to physio for NOF rehab` | Physiotherapy referral for Neck of Femur fracture rehabilitation | **0.582** β
| |
|
|
|
|
|
**All 4/4 nursing shorthand queries correctly matched to their formal definitions!** |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from sentence_transformers import SentenceTransformer |
|
|
|
|
|
# Load the model |
|
|
model = SentenceTransformer("NurseCitizenDeveloper/NurseEmbed-300M") |
|
|
|
|
|
# Encode nursing shorthand |
|
|
queries = ["Pt c/o SOB", "NEWS2 score is 7", "NOF #"] |
|
|
embeddings = model.encode(queries) |
|
|
|
|
|
# Find similar documents |
|
|
from sklearn.metrics.pairwise import cosine_similarity |
|
|
|
|
|
documents = [ |
|
|
"Patient reporting Shortness of Breath", |
|
|
"Urgent response team review required", |
|
|
"Neck of Femur fracture" |
|
|
] |
|
|
doc_embeddings = model.encode(documents) |
|
|
|
|
|
similarities = cosine_similarity(embeddings, doc_embeddings) |
|
|
print(similarities) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Stage 1: Medical Foundation |
|
|
- **Dataset**: 10,000 medical Q&A pairs |
|
|
- **Epochs**: 1 |
|
|
- **Batch Size**: 64 |
|
|
- **Learning Rate**: 2e-5 |
|
|
- **Scheduler**: Linear |
|
|
|
|
|
### Stage 2: Nursing Specialization |
|
|
- **Dataset**: 200 NHS nursing pairs (NEWS2, abbreviations, medications) |
|
|
- **Epochs**: 3 |
|
|
- **Batch Size**: 32 |
|
|
- **Learning Rate**: 1e-5 (lower for fine-tuning) |
|
|
- **Scheduler**: Cosine |
|
|
|
|
|
### Training Data Examples |
|
|
|
|
|
| Anchor (Nursing Shorthand) | Positive (Formal Definition) | |
|
|
|---------------------------|------------------------------| |
|
|
| `Early warning score 9` | `Patient requires Emergency call` | |
|
|
| `Complaint: UTI` | `Patient reporting Urinary Tract Infection` | |
|
|
| `Pt c/o SOB` | `Patient reporting Shortness of Breath / Dyspnoea` | |
|
|
| `Pt has NEWS2 of 9` | `Clinical deterioration level: Critical risk - Sepsis potential` | |
|
|
| `Score is 1 on NEWS2` | `Clinical deterioration level: Stable` | |
|
|
| `Complaint: PU` | `Patient reporting Pressure Ulcer` | |
|
|
|
|
|
## Intended Use Cases |
|
|
|
|
|
- π **Semantic search** for nursing documentation |
|
|
- π·οΈ **FHIR code suggestion** (map free text β SNOMED/LOINC) |
|
|
- π **Clinical handover assistance** (translate shorthand to formal language) |
|
|
- π **Nursing education** (teach abbreviation meanings) |
|
|
- β οΈ **NEWS2 interpretation** (map scores to clinical actions) |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained on synthetic NHS nursing data (200 samples) |
|
|
- Best suited for UK/NHS clinical terminology |
|
|
- Should be used as an assistive tool, not a replacement for clinical judgment |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{nurseembed-300m, |
|
|
author = {Lincoln Gombedza}, |
|
|
title = {NurseEmbed-300M: A Clinical Embedding Model for NHS Nursing Terminology}, |
|
|
year = {2026}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/NurseCitizenDeveloper/NurseEmbed-300M} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Author |
|
|
|
|
|
Created by **Lincoln Gombedza** ([@NurseCitizenDeveloper](https://huggingface.co/NurseCitizenDeveloper)) |
|
|
|
|
|
- π₯ Registered Learning Disability Nurse |
|
|
- π Practice Educator |
|
|
- π» Co-Chair, Digital & Technology Working Group (Professional Strategy for Nursing and Midwifery) |
|
|
- π Founder, Nursing Citizen Development Movement |
|
|
|
|
|
Part of the **OpenEnv Challenge** submission for nurse-led AI innovation. |
|
|
|