MedGPT-oss

MedGPT-oss is an open-weight 20B-parameter vision–language model for biomedicine, built on GPT-oss-20B with a CLIP-ViT-L/14@336px visual encoder and a two-layer MLP projector. It is trained with a three-stage curriculum (alignment → long-context mid-training → instruction tuning) and is designed for on-premises, privacy-preserving clinical research.

📄 Paper: arXiv:2603.00842

Usage

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

model_id = "UFNLP/MedGPT-oss"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
)

image = Image.open("chest_xray.png")
messages = [{"role": "user", "content": [
    {"type": "image"},
    {"type": "text", "text": "Describe the findings in this chest X-ray."},
]}]
inputs = processor.apply_chat_template(
    messages, images=[image], add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Citation

@article{zhang2026medgptoss,
  title   = {MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine},
  author  = {Zhang, Kai and Yuan, Zhengqing and Peng, Cheng and Zhao, Songlin and
             Lyu, Mengxian and Chen, Ziyi and Ye, Yanfang and Liu, Wei and
             Zhang, Ying and Smith, Kaleb E. and He, Lifang and Sun, Lichao and Wu, Yonghui},
  journal = {arXiv preprint arXiv:2603.00842},
  year    = {2026}
}

Contact

Lichao Sun (lis221@lehigh.edu) · Yonghui Wu (yonghui.wu@ufl.edu)

Downloads last month
40
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for UFNLP/MedGPT-oss