MedGPT-oss

MedGPT-oss is an open-weight 20B-parameter vision–language model for biomedicine, built on GPT-oss-20B with a CLIP-ViT-L/14@336px visual encoder and a two-layer MLP projector. It is trained with a three-stage curriculum (alignment → long-context mid-training → instruction tuning) and is designed for on-premises, privacy-preserving clinical research.

📄 Paper: arXiv:2603.00842

Usage

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

model_id = "UFNLP/MedGPT-oss"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
)

image = Image.open("chest_xray.png")
messages = [{"role": "user", "content": [
    {"type": "image"},
    {"type": "text", "text": "Describe the findings in this chest X-ray."},
]}]
inputs = processor.apply_chat_template(
    messages, images=[image], add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Citation

@article{zhang2026medgptoss,
  title   = {MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine},
  author  = {Zhang, Kai and Yuan, Zhengqing and Peng, Cheng and Zhao, Songlin and
             Lyu, Mengxian and Chen, Ziyi and Ye, Yanfang and Liu, Wei and
             Zhang, Ying and Smith, Kaleb E. and He, Lifang and Sun, Lichao and Wu, Yonghui},
  journal = {arXiv preprint arXiv:2603.00842},
  year    = {2026}
}

Contact

Lichao Sun (lis221@lehigh.edu) · Yonghui Wu (yonghui.wu@ufl.edu)

Downloads last month: 40

Safetensors

Model size

0.4B params

Tensor type

BF16

Paper for UFNLP/MedGPT-oss

MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine

Paper • 2603.00842 • Published Mar 1