You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ₯ NurseEmbed-300M

A clinical embedding model fine-tuned for NHS nursing terminology and medical Q&A retrieval.

Model Description

NurseEmbed-300M is based on EmbeddingGemma-300M and trained using a two-stage hybrid approach:

Stage Dataset Samples Focus
Stage 1 tomaarsen/miriad-4.4M-split 10,000 Medical Q&A from peer-reviewed biomedical literature
Stage 2 Custom NHS Dataset 200 Nursing shorthand, NEWS2 scores, clinical abbreviations

πŸ“Š Evaluation Results

Medical Domain (Information Retrieval)

Metric Score
Accuracy@1 81.3%
Accuracy@10 95.4%

Real-World Nursing Shorthand Matching

Nursing Shorthand Matched Definition Similarity
Pt c/o SOB Patient reporting Shortness of Breath / Dyspnoea 0.460 βœ…
NEWS2 score is 7 Urgent response team review required 0.242 βœ…
Given Paracetamol 1g PO Medication administration: Analgesic / Antipyretic 0.224 βœ…
Plan: Refer to physio for NOF rehab Physiotherapy referral for Neck of Femur fracture rehabilitation 0.582 βœ…

All 4/4 nursing shorthand queries correctly matched to their formal definitions!

Usage

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("NurseCitizenDeveloper/NurseEmbed-300M")

# Encode nursing shorthand
queries = ["Pt c/o SOB", "NEWS2 score is 7", "NOF #"]
embeddings = model.encode(queries)

# Find similar documents
from sklearn.metrics.pairwise import cosine_similarity

documents = [
    "Patient reporting Shortness of Breath",
    "Urgent response team review required",
    "Neck of Femur fracture"
]
doc_embeddings = model.encode(documents)

similarities = cosine_similarity(embeddings, doc_embeddings)
print(similarities)

Training Details

Stage 1: Medical Foundation

  • Dataset: 10,000 medical Q&A pairs
  • Epochs: 1
  • Batch Size: 64
  • Learning Rate: 2e-5
  • Scheduler: Linear

Stage 2: Nursing Specialization

  • Dataset: 200 NHS nursing pairs (NEWS2, abbreviations, medications)
  • Epochs: 3
  • Batch Size: 32
  • Learning Rate: 1e-5 (lower for fine-tuning)
  • Scheduler: Cosine

Training Data Examples

Anchor (Nursing Shorthand) Positive (Formal Definition)
Early warning score 9 Patient requires Emergency call
Complaint: UTI Patient reporting Urinary Tract Infection
Pt c/o SOB Patient reporting Shortness of Breath / Dyspnoea
Pt has NEWS2 of 9 Clinical deterioration level: Critical risk - Sepsis potential
Score is 1 on NEWS2 Clinical deterioration level: Stable
Complaint: PU Patient reporting Pressure Ulcer

Intended Use Cases

  • πŸ” Semantic search for nursing documentation
  • 🏷️ FHIR code suggestion (map free text β†’ SNOMED/LOINC)
  • πŸ“‹ Clinical handover assistance (translate shorthand to formal language)
  • πŸŽ“ Nursing education (teach abbreviation meanings)
  • ⚠️ NEWS2 interpretation (map scores to clinical actions)

Limitations

  • Trained on synthetic NHS nursing data (200 samples)
  • Best suited for UK/NHS clinical terminology
  • Should be used as an assistive tool, not a replacement for clinical judgment

Citation

@misc{nurseembed-300m,
  author = {Lincoln Gombedza},
  title = {NurseEmbed-300M: A Clinical Embedding Model for NHS Nursing Terminology},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/NurseCitizenDeveloper/NurseEmbed-300M}
}

Author

Created by Lincoln Gombedza (@NurseCitizenDeveloper)

  • πŸ₯ Registered Learning Disability Nurse
  • πŸŽ“ Practice Educator
  • πŸ’» Co-Chair, Digital & Technology Working Group (Professional Strategy for Nursing and Midwifery)
  • πŸš€ Founder, Nursing Citizen Development Movement

Part of the OpenEnv Challenge submission for nurse-led AI innovation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for NurseCitizenDeveloper/NurseEmbed-300M

Finetuned
(3)
this model

Spaces using NurseCitizenDeveloper/NurseEmbed-300M 3