NurseEmbed-300M / README.md
NurseCitizenDeveloper's picture
Enhanced model card with actual training results and real-world test examples
126d92d verified
metadata
tags:
  - sentence-transformers
  - embedding
  - nursing
  - clinical-nlp
  - healthcare
  - NHS
  - medical
  - triage
  - NEWS2
language:
  - en
license: apache-2.0
library_name: sentence-transformers
pipeline_tag: sentence-similarity
base_model: unsloth/embeddinggemma-300m

πŸ₯ NurseEmbed-300M

A clinical embedding model fine-tuned for NHS nursing terminology and medical Q&A retrieval.

Model Description

NurseEmbed-300M is based on EmbeddingGemma-300M and trained using a two-stage hybrid approach:

Stage Dataset Samples Focus
Stage 1 tomaarsen/miriad-4.4M-split 10,000 Medical Q&A from peer-reviewed biomedical literature
Stage 2 Custom NHS Dataset 200 Nursing shorthand, NEWS2 scores, clinical abbreviations

πŸ“Š Evaluation Results

Medical Domain (Information Retrieval)

Metric Score
Accuracy@1 81.3%
Accuracy@10 95.4%

Real-World Nursing Shorthand Matching

Nursing Shorthand Matched Definition Similarity
Pt c/o SOB Patient reporting Shortness of Breath / Dyspnoea 0.460 βœ…
NEWS2 score is 7 Urgent response team review required 0.242 βœ…
Given Paracetamol 1g PO Medication administration: Analgesic / Antipyretic 0.224 βœ…
Plan: Refer to physio for NOF rehab Physiotherapy referral for Neck of Femur fracture rehabilitation 0.582 βœ…

All 4/4 nursing shorthand queries correctly matched to their formal definitions!

Usage

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("NurseCitizenDeveloper/NurseEmbed-300M")

# Encode nursing shorthand
queries = ["Pt c/o SOB", "NEWS2 score is 7", "NOF #"]
embeddings = model.encode(queries)

# Find similar documents
from sklearn.metrics.pairwise import cosine_similarity

documents = [
    "Patient reporting Shortness of Breath",
    "Urgent response team review required",
    "Neck of Femur fracture"
]
doc_embeddings = model.encode(documents)

similarities = cosine_similarity(embeddings, doc_embeddings)
print(similarities)

Training Details

Stage 1: Medical Foundation

  • Dataset: 10,000 medical Q&A pairs
  • Epochs: 1
  • Batch Size: 64
  • Learning Rate: 2e-5
  • Scheduler: Linear

Stage 2: Nursing Specialization

  • Dataset: 200 NHS nursing pairs (NEWS2, abbreviations, medications)
  • Epochs: 3
  • Batch Size: 32
  • Learning Rate: 1e-5 (lower for fine-tuning)
  • Scheduler: Cosine

Training Data Examples

Anchor (Nursing Shorthand) Positive (Formal Definition)
Early warning score 9 Patient requires Emergency call
Complaint: UTI Patient reporting Urinary Tract Infection
Pt c/o SOB Patient reporting Shortness of Breath / Dyspnoea
Pt has NEWS2 of 9 Clinical deterioration level: Critical risk - Sepsis potential
Score is 1 on NEWS2 Clinical deterioration level: Stable
Complaint: PU Patient reporting Pressure Ulcer

Intended Use Cases

  • πŸ” Semantic search for nursing documentation
  • 🏷️ FHIR code suggestion (map free text β†’ SNOMED/LOINC)
  • πŸ“‹ Clinical handover assistance (translate shorthand to formal language)
  • πŸŽ“ Nursing education (teach abbreviation meanings)
  • ⚠️ NEWS2 interpretation (map scores to clinical actions)

Limitations

  • Trained on synthetic NHS nursing data (200 samples)
  • Best suited for UK/NHS clinical terminology
  • Should be used as an assistive tool, not a replacement for clinical judgment

Citation

@misc{nurseembed-300m,
  author = {Lincoln Gombedza},
  title = {NurseEmbed-300M: A Clinical Embedding Model for NHS Nursing Terminology},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/NurseCitizenDeveloper/NurseEmbed-300M}
}

Author

Created by Lincoln Gombedza (@NurseCitizenDeveloper)

  • πŸ₯ Registered Learning Disability Nurse
  • πŸŽ“ Practice Educator
  • πŸ’» Co-Chair, Digital & Technology Working Group (Professional Strategy for Nursing and Midwifery)
  • πŸš€ Founder, Nursing Citizen Development Movement

Part of the OpenEnv Challenge submission for nurse-led AI innovation.