ACE-LoRA: Graph-Attentive Context Enhancement for Medical VLMs

ACE-LoRA is a parameter-efficient adaptation framework designed for generalist medical Vision-Language Models (VLMs). It addresses the specialization–generalization trade-off by integrating Low-Rank Adaptation (LoRA) with a novel Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN).

Model Description

Existing medical VLMs often struggle to balance broad semantic understanding with fine-grained diagnostic cues. ACE-LoRA bridges this gap by adding only 0.95M trainable parameters to frozen image-text encoders.

Key Features:

  • ACE-HGNN Module: Captures higher-order contextual interactions beyond pairwise similarity, enriching global representations with localized diagnostic details.
  • Label-Guided InfoNCE Loss: A specialized loss formulation designed to suppress false negatives between semantically related image-text pairs, improving cross-modal alignment.
  • Efficiency: Achieves state-of-the-art performance across multiple domains while keeping the backbone frozen.

Environment Setup

The framework was developed using Python 3.10.18 and PyTorch 2.1.0 with CUDA 11.8.

conda create -n ace_lora python=3.10.18
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Inference

We provide an inference code sample (hf_model_inference.py) for the RSNA dataset.

Datasets

MIMIC-CXR: For pretraining, we use the MIMIC-CXR dataset and exclude lateral images. Access to the dataset is available at the following link (note that you must satisfy the dataset provider’s requirements to download the data): [link]

NIH Chest X-ray: For validation, we use the NIH Chest X-ray dataset. The dataset can be accessed at the following link: [link]. After downloading, run dataset_prep/chestx-ray_14_prep.py from our github repo to split the data and prepare it in the required format.

CheXpert 5x200: For zero-shot classification, we use the CheXpert 5×200 dataset. The dataset can be accessed at the following link: [link].

RSNA: We use the RSNA dataset for both zero-shot classification and object detection. The dataset can be accessed at the following link: [link]. After downloading, run dataset_prep/rsna_dataset_create.py from our github repo to split the data and prepare it in the required format for both tasks.

SIIM: We use the SIIM dataset for both zero-shot classification and semantic segmentation. The dataset can be accessed at the following link: [link]. After downloading, run dataset_prep/SIIM_generate_class_labels.py from our github repo to prepare the data for zero-shot classification, and dataset_prep/SIIM_generate_mask.py for semantic segmentation.

🤝 Acknowledgments

This implementation builds upon CLIP-LoRA and LoRA. We gratefully acknowledge their valuable contributions.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for aydnarda/ACE-LoRA