Model Card for CXformer

CXformer is a vision transformer tailored for chest X-ray analysis, adapted from DINOv2 with clinically motivated training modifications. This repository provides code for pretraining CXformer using our optimized pipeline, as well as scripts for finetuning on downstream tasks like classification, segmentation, and report generation. For more details on pre-training, please checkout our paper accepted at MIDL 2025.

Key highlights:

CXformer Architecture

Pretrain Dataset

CXformer was pretrained on publicly available datasets, focusing on frontal views of chest X-rays (PA/AP):

  • CheXpert
  • MIMIC-CXR
  • PadChest
  • NIH-CXR8
  • BRAX

The official training splits were used for CheXpert, MIMIC and NIH, and all available samples in BRAX and PadChest were used in pretraining.

Downstream Tasks

Task Dataset(s)
Image Classification CheXpert, NIH-CXR8, RSNA, VinDr
Segmentation CheXmask
Report Generation MIMIC-CXR, IU-Xray

Usage

from transformers import AutoModel, AutoImageProcessor
from PIL import Image

model_name = "m42-health/CXformer-base"

image_processor = AutoImageProcessor.from_pretrained(model_name,trust_remote_code=True)
model = AutoModel.from_pretrained(model_name)

model.eval()

image = Image.open('sample_cxr.png')

image = image_processor(image, return_tensors='pt')
print(image['pixel_values'].shape) # [1,3,518,518]

print("Doing forwardpass...")
output = model(**image).last_hidden_state  # [1, 1374, 768]

Results Summary

Classification (AUROC)

Model CheXpert RSNA NIH-CXR8 Avg.
CXformer(S) 83.34 91.13 83.68 86.05
CXformer(B) 86.80 91.71 85.28 87.93

Segmentation (Dice Score)

Model Lungs Heart Avg.
CXformer(S) 91.69 89.35 90.52
CXformer(B) 91.94 89.94 90.94

Report Generation (MIMIC-CXR)

Model ROUGE-L BLEU-4 RGER F1-14 Avg.
CXformer(S) 25.25 9.11 23.06 33.85 27.51
CXformer(B) 24.93 9.03 22.94 33.45 27.16

Disclaimer

CXformer is intended exclusively for research purposes. It is not validated for clinical decision-making, nor is it approved for use in healthcare environments. The model should not be used for any diagnostic or therapeutic applications in a clinical setting.

License

This project is licensed under CC BY-NC-4.0

Citation

@inproceedings{al2025empirical,
  title={Empirical Analysis of Scaling Vision Foundation Models for Chest X-rays},
  author={Al Mahrooqi, Ahmed and Munjal, Prateek and Rajan, Ronnie and Pimentel, Marco AF and Kanithi, Praveenkumar},
  booktitle={Medical Imaging with Deep Learning},
  year={2025}
}
Downloads last month
19
Safetensors
Model size
22.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for m42-health/CXformer-small

Finetuned
(3)
this model