PubMedUL2 & MedUL2

Model Description

PubMedUL2 and MedUL2 are a family of domain-specific UL2/T5-style encoder–decoder language models pretrained on large-scale biomedical and medical corpora using the UL2 (Mixture-of-Denoisers) objective.

  • PubMedUL2 models are pretrained on 25 million PubMed abstracts
  • MedUL2 models are pretrained on PubMed abstracts + clinical notes + additional medical documents
  • All models use a T5-efficient architecture, inspired by Google’s efficient T5 variants

These checkpoints are pretraining-only models and must be fine-tuned before use on downstream tasks.


Pretraining Objective: UL2 (Mixture-of-Denoisers)

These models were pretrained using UL2, a unified framework that formulates language modeling objectives as denoising tasks.

UL2 introduces a Mixture-of-Denoisers (MoD) approach that samples from multiple denoising paradigms during pretraining.

Denoising Tasks

UL2 pretraining uses a mixture of three denoising tasks:

  1. R-denoising (Regular Span Corruption)

    • Equivalent to standard T5 span corruption
    • Optimized for language understanding tasks
  2. X-denoising (Extreme Span Corruption)

    • Uses very large masked spans
    • Encourages long-form generation and abstraction
  3. S-denoising (Sequential / PrefixLM)

    • Prefix language modeling similar to causal LM
    • Suitable for sequence-to-sequence and generative tasks

Paradigm Tokens (Mode Switching)

During pretraining, a paradigm token is inserted at the beginning of each input:

Token Mode Recommended Use
[NLU] R-denoising Classification, QA, retrieval
[NLG] X-denoising Mixed understanding & generation
[S2S] S-denoising Generative / causal tasks

Important:
For best performance, the same token should be prepended during fine-tuning and inference.


Architecture

  • Encoder–decoder Transformer (T5-style)
  • Uses T5-efficient architecture
  • Compatible with Hugging Face T5ForConditionalGeneration

Intended Uses

These models are intended to be fine-tuned for:

  • Biomedical and clinical text classification
  • Question answering
  • Summarization of medical literature or clinical notes
  • Text generation in medical contexts

Limitations

  • ❌ Not instruction-tuned
  • ❌ No supervised training
  • ❌ Not suitable for zero-shot use

These checkpoints are self-supervised pretraining models only and require task-specific fine-tuning.


Fine-Tuning Recommendations

  • Avoid mixed precision (fp16 / bf16) initially
    • Fine-tuning is more stable in fp32
  • Always prepend one of [NLU], [NLG], or [S2S] to input text
  • Suggested defaults:
    • Classification / QA → [NLU]
    • Causal or generative tasks → [S2S]
    • Mixed tasks → [NLG]

Model Parameter Summary

Model Name Parameter Count Description Access
pubmedul2-tiny-nl6 19.26M Tiny UL2-style model with 6 layers Open
pubmedul2-mini-nl8 50.12M Mini UL2 with 8 layers Open
pubmedul2-small 60.52M Small UL2 variant Open
pubmedul2-small-nl24 192.73M Small UL2 with 24 layers Open
medul2-base 222.93M Base UL2/T5-style model Open
pubmedul2-base 222.93M Base UL2/T5-style model Open
medul2-base-nl36 619.44M Base UL2 with 36 layers Gated commercial
pubmedul2-base-nl36 619.44M Base UL2 with 36 layers Gated commercial
medul2-large 737.72M Large UL2/T5-style model Gated non-commercial
pubmedul2-large 737.72M Large UL2/T5-style model Gated non-commercial
medul2-large-nl36 1090.14M Very large UL2 with 36 layers Access on Request

Named Entity Recognition (NER) Evaluation

We evaluate PubMedUL2 and MedUL2 models on a biomedical Named Entity Recognition (NER) task using multiple matching criteria to better capture boundary-level performance.

The evaluation reports entity-level F1 scores across different biomedical entity types and model sizes.

Exact Match F1

An entity prediction is considered correct only if both the entity span and label exactly match the gold annotation.

entity_type medul2-base pubmedul2-base pubmedul2-mini-nl8 pubmedul2-small pubmedul2-tiny-nl6
cell_line 0.42 0.43 0.44 0.43 0.35
cell_type 0.59 0.58 0.59 0.58 0.52
chemical 0.76 0.75 0.72 0.72 0.56
disease 0.7 0.73 0.7 0.68 0.63
dna 0.59 0.55 0.54 0.55 0.45
gene 0.62 0.59 0.6 0.59 0.55
protein 0.59 0.58 0.58 0.59 0.55
rna 0.6 0.56 0.55 0.6 0.56
species 0.66 0.67 0.58 0.63 0.54

Partial Match F1

A prediction is counted as correct if it partially overlaps with a gold entity of the same type.

entity_type medul2-base pubmedul2-base pubmedul2-mini-nl8 pubmedul2-small pubmedul2-tiny-nl6
cell_line 0.48 0.49 0.48 0.48 0.41
cell_type 0.66 0.64 0.66 0.65 0.59
chemical 0.79 0.78 0.76 0.75 0.6
disease 0.82 0.84 0.8 0.79 0.74
dna 0.65 0.61 0.6 0.61 0.53
gene 0.76 0.74 0.74 0.73 0.68
protein 0.66 0.66 0.66 0.67 0.64
rna 0.68 0.63 0.64 0.66 0.65
species 0.68 0.7 0.61 0.65 0.56

IoU Match F1

Predictions are evaluated using Intersection-over-Union (IoU) overlap between predicted and gold spans, providing a softer boundary-based metric.

entity_type medul2-base pubmedul2-base pubmedul2-mini-nl8 pubmedul2-small pubmedul2-tiny-nl6
cell_line 0.5 0.5 0.5 0.5 0.42
cell_type 0.67 0.66 0.68 0.67 0.62
chemical 0.83 0.83 0.82 0.82 0.72
disease 0.85 0.86 0.86 0.85 0.82
dna 0.65 0.62 0.62 0.62 0.55
gene 0.76 0.75 0.75 0.74 0.71
protein 0.67 0.66 0.67 0.67 0.66
rna 0.68 0.65 0.66 0.67 0.67
species 0.72 0.74 0.65 0.69 0.58

Observations

  • MedUL2 models generally outperform PubMedUL2 on clinical-heavy entity types such as disease and chemical
  • Performance improves consistently from tiny → base models
  • Boundary-sensitive metrics (Partial / IoU) show significantly higher scores than Exact Match, highlighting boundary ambiguity in biomedical NER

Acknowledgements

This project would not have been possible without compute generously provided by Google TPU Research Cloud.

Thanks to:

  • The Finnish-NLP authors for releasing the UL2 objective code, task definitions, and guidance
  • Yeb Havinga for help getting started with the t5x framework

License

Please refer to the individual model repositories for license and access details, which may vary depending on training data sources.

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Siddharth63/pubmedul2-tiny-nl6

Quantizations
1 model

Collection including Siddharth63/pubmedul2-tiny-nl6