[ 🎯 NeurIPS 2025 ] 3D-RAD 🩻: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks

📢 News

What's New in This Update 🚀

2025.10.23: 🔥 Updated the latest version of the paper!
2025.09.19: 🔥 Paper accepted to NeurIPS 2025! 🎯
2025.05.16: 🔥 Set up the repository and committed the dataset!

🔍 Overview

💡 In this repository, we present the model for "3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks".

The code for our project is available on GitHub: Tang-xiaoxiao/3D-RAD

In our project, we collect a large-scale dataset designed to advance 3D Med-VQA using radiology CT scans, 3D-RAD, encompasses six diverse VQA tasks: Anomaly Detection (task 1), Image Observation (task 2), Medical Computation (task 3), Existence Detection (task 4), Static Temporal Diagnosis (task 5), and Longitudinal Temporal Diagnosis (task 6).

🤖 M3D-RAD Model

To assess the utility of 3D-RAD, we finetuned two M3D model variants with different parameter scales, thereby constructing the M3D-RAD models. You can find our finetuned model in M3D-RAD_Models.

📈 Evaluation

Zero-Shot Evaluation.

We conducted zero-shot evaluation of several stateof-the-art 3D medical vision-language models on our benchmark to assess their generalization capabilities.

In the RadFM and M3D directory, there are code for evaluating RadFM and M3D models on our 3D-RAD benchmark. Note that, the base code is RadFM, and the base code is M3D. To run our evaluation, you should first satisfy the requirements and download the models according to the base code of these models.

Compare to the base code, we make the following modifications: In the RadFM directory, we add a new Dataset in RadFM/src/Dataset/dataset/rad_dataset.py and modify the Dataset to test in RadFM/src/Dataset/multi_dataset_test.py. Then we add a new python file to evaluate our benchmark in RadFM/src/eval_3DRAD.py. In the M3D directory, we add a new Dataset in M3D/Bench/dataset/multi_dataset.py and add a new python file to evaluate our benchmark in M3D/Bench/eval/eval_3DRAD.py.

You can evaluate RadFM on our 3D-RAD benchmark by running:

cd 3D-RAD/RadFM/src
python eval_3DRAD.py \
--file_path={your test file_path} \
--output_path={your saved output_path}

You can evaluate M3D on our 3D-RAD benchmark by running:

cd 3D-RAD/M3D
python Bench/eval/eval_3DRAD.py \
--model_name_or_path={your model_name} \
--vqa_data_test_path={your test file_path} \
--output_dir={your saved output_dir}

Scaling with Varying Training Set Sizes.

To further investigate the impact of dataset scale on model performance, we randomly sampled 1%, 10% and 100% of the training data per task and fine-tuned M3D accordingly.

📊 3D-RAD Dataset

In the 3DRAD directory, there are QA data without 3D images. You can find the full dataset with 3D images (For efficient model input, the original CT images were preprocessed and converted into .npy format.) in 3D-RAD_Dataset.

📁 Data Source

The original CT scans in our dataset are derived from CT-RATE, which is released under a CC-BY-NC-SA license. We fully comply with the license terms by using the data for non-commercial academic research, providing proper attribution.

🔗 Model Links

Model	Paper
RadFM	Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data
M3D	M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
OmniV(not open)	OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support