π₯ NurseEmbed-300M
A clinical embedding model fine-tuned for NHS nursing terminology and medical Q&A retrieval.
Model Description
NurseEmbed-300M is based on EmbeddingGemma-300M and trained using a two-stage hybrid approach:
| Stage | Dataset | Samples | Focus |
|---|---|---|---|
| Stage 1 | tomaarsen/miriad-4.4M-split |
10,000 | Medical Q&A from peer-reviewed biomedical literature |
| Stage 2 | Custom NHS Dataset | 200 | Nursing shorthand, NEWS2 scores, clinical abbreviations |
π Evaluation Results
Medical Domain (Information Retrieval)
| Metric | Score |
|---|---|
| Accuracy@1 | 81.3% |
| Accuracy@10 | 95.4% |
Real-World Nursing Shorthand Matching
| Nursing Shorthand | Matched Definition | Similarity |
|---|---|---|
Pt c/o SOB |
Patient reporting Shortness of Breath / Dyspnoea | 0.460 β |
NEWS2 score is 7 |
Urgent response team review required | 0.242 β |
Given Paracetamol 1g PO |
Medication administration: Analgesic / Antipyretic | 0.224 β |
Plan: Refer to physio for NOF rehab |
Physiotherapy referral for Neck of Femur fracture rehabilitation | 0.582 β |
All 4/4 nursing shorthand queries correctly matched to their formal definitions!
Usage
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("NurseCitizenDeveloper/NurseEmbed-300M")
# Encode nursing shorthand
queries = ["Pt c/o SOB", "NEWS2 score is 7", "NOF #"]
embeddings = model.encode(queries)
# Find similar documents
from sklearn.metrics.pairwise import cosine_similarity
documents = [
"Patient reporting Shortness of Breath",
"Urgent response team review required",
"Neck of Femur fracture"
]
doc_embeddings = model.encode(documents)
similarities = cosine_similarity(embeddings, doc_embeddings)
print(similarities)
Training Details
Stage 1: Medical Foundation
- Dataset: 10,000 medical Q&A pairs
- Epochs: 1
- Batch Size: 64
- Learning Rate: 2e-5
- Scheduler: Linear
Stage 2: Nursing Specialization
- Dataset: 200 NHS nursing pairs (NEWS2, abbreviations, medications)
- Epochs: 3
- Batch Size: 32
- Learning Rate: 1e-5 (lower for fine-tuning)
- Scheduler: Cosine
Training Data Examples
| Anchor (Nursing Shorthand) | Positive (Formal Definition) |
|---|---|
Early warning score 9 |
Patient requires Emergency call |
Complaint: UTI |
Patient reporting Urinary Tract Infection |
Pt c/o SOB |
Patient reporting Shortness of Breath / Dyspnoea |
Pt has NEWS2 of 9 |
Clinical deterioration level: Critical risk - Sepsis potential |
Score is 1 on NEWS2 |
Clinical deterioration level: Stable |
Complaint: PU |
Patient reporting Pressure Ulcer |
Intended Use Cases
- π Semantic search for nursing documentation
- π·οΈ FHIR code suggestion (map free text β SNOMED/LOINC)
- π Clinical handover assistance (translate shorthand to formal language)
- π Nursing education (teach abbreviation meanings)
- β οΈ NEWS2 interpretation (map scores to clinical actions)
Limitations
- Trained on synthetic NHS nursing data (200 samples)
- Best suited for UK/NHS clinical terminology
- Should be used as an assistive tool, not a replacement for clinical judgment
Citation
@misc{nurseembed-300m,
author = {Lincoln Gombedza},
title = {NurseEmbed-300M: A Clinical Embedding Model for NHS Nursing Terminology},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/NurseCitizenDeveloper/NurseEmbed-300M}
}
Author
Created by Lincoln Gombedza (@NurseCitizenDeveloper)
- π₯ Registered Learning Disability Nurse
- π Practice Educator
- π» Co-Chair, Digital & Technology Working Group (Professional Strategy for Nursing and Midwifery)
- π Founder, Nursing Citizen Development Movement
Part of the OpenEnv Challenge submission for nurse-led AI innovation.
Model tree for NurseCitizenDeveloper/NurseEmbed-300M
Base model
unsloth/embeddinggemma-300m