|
# Model Overview |
|
Vista3D model fintuning/evaluation/inference pipeline. VISTA3D is trained using over 20 partial datasets with more complicated pipeline. To avoid confusion, we will only provide finetuning/continual learning APIs for users to finetune on their |
|
own datasets. To reproduce the paper results, please refer to https://github.com/Project-MONAI/VISTA/tree/main/vista3d |
|
|
|
# Installation Guide |
|
``` |
|
pip install "monai[fire]" |
|
python -m monai.bundle download "vista3d" --bundle_dir "bundles/" |
|
``` |
|
Please refer to monai model zoo (https://github.com/Project-MONAI/model-zoo) for more details. |
|
# Inference: |
|
The bundle only provides single-gpu inference. User can modify within the inference [config](../configs/inference.json). |
|
## Single image inference to segment everything (automatic) |
|
The output will be saved to `output_dir/spleen_03/spleen_03_{output_postfix}{output_ext}`. |
|
``` |
|
python -m monai.bundle run --config_file configs/inference.json --input_dict "{'image':'spleen_03.nii.gz'} |
|
``` |
|
## Single image inference to segment specific class (automatic) |
|
The detailed automatic segmentation class index can be found [here](../configs/metadata.json). |
|
``` |
|
python -m monai.bundle run --config_file configs/inference.json --input_dict "{'image':'spleen_03.nii.gz','label_prompt':[3]} |
|
``` |
|
|
|
## Batch inference for segmenting everything (automatic) |
|
``` |
|
python -m monai.bundle run --config_file="['configs/inference.json', 'configs/batch_inference.json']" --input_dir="/data/Task09_Spleen/imagesTr" --output_dir="./eval_task09" |
|
``` |
|
`configs/batch_inference.json` by default runs the segment everything workflow (classes defined by `everything_labels`) on all (`*.nii.gz`) files in `input_dir`. |
|
This default is overridable by changing the input folder `input_dir`, or the input image name suffix `input_suffix`, or directly setting the list of filenames `input_list`. |
|
|
|
``` |
|
Note: if using the finetuned checkpoint and the finetuning label_mapping mapped to global index "2, 20, 21", remove the `subclass` dict from inference.json since those values defined in `subclass` will trigger the wrong subclass segmentation. |
|
``` |
|
|
|
## Configuration details and interactive segmentation |
|
|
|
For inference, VISTA3d bundle requires at least one prompt for segmentation. It supports label prompt, which is the index of the class for automatic segmentation. |
|
It also supports point click prompts for binary interactive segmentation. User can provide both prompts at the same time. Please refer to [this](inference.md). |
|
|
|
## Execute inference with the TensorRT model: |
|
|
|
``` |
|
python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']" |
|
``` |
|
For more details, please refer to [this](inference.md). |
|
|
|
|
|
# Continual learning / Finetuning |
|
|
|
## Step1: Generate Data json file |
|
Users need to provide a json data split for continuous learning (`configs/msd_task09_spleen_folds.json` from the [MSD](http://medicaldecathlon.com/) is provided as an example). The data split should meet the following format ('testing' labels are optional): |
|
```json |
|
{ |
|
"training": [ |
|
{"image": "img0001.nii.gz", "label": "label0001.nii.gz", "fold": 0}, |
|
{"image": "img0002.nii.gz", "label": "label0002.nii.gz", "fold": 2}, |
|
... |
|
], |
|
"testing": [ |
|
{"image": "img0003.nii.gz", "label": "label0003.nii.gz"}, |
|
{"image": "img0004.nii.gz", "label": "label0004.nii.gz"}, |
|
... |
|
] |
|
} |
|
``` |
|
Example code for 5 fold cross-validation generation can be found [here](data.md) |
|
``` |
|
Note the data is not the absolute path to the image and label file. The actual image file will be `os.path.join(dataset_dir, data["training"][item]["image"])`, where `dataset_dir` is defined in `configs/train_continual.json`. Also 5-fold cross-validation is not required! `fold=0` is defined in train.json, which means any data item with fold==0 will be used as validation and other fold will be used for training. So if you only have train/val split, you can manually set validation data with "fold": 0 in its datalist and the other to be training by setting "fold" to any number other than 0. |
|
``` |
|
## Step2: Changing hyperparameters |
|
For continual learning, user can change `configs/train_continual.json`. More advanced users can change configurations in `configs/train.json`. Most hyperparameters are straighforward and user can tell based on their names. The users must manually change the following keys in `configs/train_continual.json`. |
|
#### 1. `label_mappings` |
|
``` |
|
"label_mappings": { |
|
"default": [ |
|
[ |
|
index_1_in_user_data, # e.g. 1 |
|
mapped_index_1, # e.g. 1 |
|
], |
|
[ |
|
index_2_in_user_data, # e.g. 2 |
|
mapped_index_2, # e.g. 2 |
|
], ..., |
|
[ |
|
index_last_in_user_data, # e.g. N |
|
mapped_index_N, # e.g. N |
|
] |
|
] |
|
}, |
|
``` |
|
`index_1_in_user_data`,...,`index_N_in_user_data` is the class index value in the groundtruth that user tries to segment. `mapped_index_1`,...,`mapped_index_N` is the mapped index value that the bundle will output. You can make these two the same for finetuning, but we suggest finding the semantic relevant mappings from our unified [global label index](../configs/metadata.json). For example, "Spleen" in MSD09 groundtruth label is represented by 1, but "Spleen" is 3 in `docs/labels.json`. So by defining label mapping `[[1, 3]]`, VISTA3D can segment "Spleen" using its pretrained weights out-of-the-box, and can speed up the finetuning convergence speed. If you cannot find a relevant semantic label for your class, just use any value < `num_classes` defined in train_continue.json. For more details about this label_mapping, please read [this](finetune.md). |
|
|
|
#### 2. `data_list_file_path` and `dataset_dir` |
|
Change `data_list_file_path` to the absolute path of your data json split. Change `dataset_dir` to the root folder that combines with the relative path in the data json split. |
|
|
|
#### 3. Optional hyperparameters and details are [here](finetune.md). |
|
Hyperparameteers finetuning is important and varies from task to task. |
|
|
|
## Step3: Run finetuning |
|
The hyperparameters in `configs/train_continual.json` will overwrite ones in `configs/train.json`. Configs in the back will overide the previous ones if they have the same key. |
|
|
|
Single-GPU: |
|
```bash |
|
python -m monai.bundle run \ |
|
--config_file="['configs/train.json','configs/train_continual.json']" |
|
``` |
|
|
|
Multi-GPU: |
|
```bash |
|
torchrun --nnodes=1 --nproc_per_node=8 -m monai.bundle run \ |
|
--config_file="['configs/train.json','configs/train_continual.json','configs/multi_gpu_train.json']" |
|
``` |
|
|
|
#### MLFlow Visualization |
|
|
|
MLFlow is enabled by default (defined in train.json, use_mlflow) and the data is stored in the `mlruns/` folder under the bundle's root directory. To launch the MLflow UI and track your experiment data, follow these steps: |
|
|
|
1. Open a terminal and navigate to the root directory of your bundle where the `mlruns/` folder is located. |
|
|
|
2. Execute the following command to start the MLflow server. This will make the MLflow UI accessible. |
|
|
|
```Bash |
|
mlflow ui |
|
``` |
|
|
|
# Evaluation |
|
Evaluation can be used to calculate dice scores for the model or a finetuned model. Change the `ckpt_path` to the checkpoint you wish to evaluate. The dice score is calculated on the original image spacing using `invertd`, while the dice score during finetuning is calculated on resampled space. |
|
|
|
``` |
|
NOTE: Evaluation does not support point evaluation.`"validate#evaluator#hyper_kwargs#val_head` is always set to `auto`. |
|
``` |
|
|
|
Single-GPU: |
|
``` |
|
python -m monai.bundle run \ |
|
--config_file="['configs/train.json','configs/train_continual.json','configs/evaluate.json']" |
|
``` |
|
|
|
Multi-GPU: |
|
``` |
|
torchrun --nnodes=1 --nproc_per_node=8 -m monai.bundle run \ |
|
--config_file="['configs/train.json','configs/train_continual.json','configs/evaluate.json','configs/mgpu_evaluate.json']" |
|
``` |
|
#### Other explanatory items |
|
The `label_mapping` in `evaluation.json` does not include `0` because the postprocessing step performs argmax (`VistaPostTransformd`), and a `0` prediction would negatively impact performance. In continuous learning, however, `0` is included for validation because no argmax is performed, and validation is done channel-wise (include_background=False). Additionally, `Relabeld` in `postprocessing` is required to map `label` and `pred` back to sequential indexes like `0, 1, 2, 3, 4` for dice calculation, as they are not in one-hot format. Evaluation does not support `point`, but finetuning does, as it does not perform argmax. |
|
|
|
|
|
# FAQ |
|
## TroubleShoot for Out-of-Memory |
|
- Changing `patch_size` to a smaller value such as `"patch_size": [96, 96, 96]` would reduce the training/inference memory footprint. |
|
- Changing `train_dataset_cache_rate` and `val_dataset_cache_rate` to a smaller value like `0.1` can solve the out-of-cpu memory issue when using huge finetuning dataset. |
|
- Set `"postprocessing#transforms#0#_disabled_": false` to move the postprocessing to cpu to reduce the GPU memory footprint. |
|
|
|
## Multi-channel input |
|
- Change `input_channels` in `train.json` to your desired channel number |
|
- Data split json can be a single multi-channel image or can be a list of single channeled images. Those images must have the same spatial shape and aligned/registered. |
|
``` |
|
{ |
|
"image": ["modality1.nii.gz", "modality2.nii.gz", "modality3.nii.gz"] |
|
"label": "label.nii.gz" |
|
}, |
|
``` |
|
## Wrong inference results from finetuned checkpoint |
|
- Make sure you removed the `subclass` dictionary from inference.json if you ever mapped local index to [2,20,21] |
|
- Make sure `0` is not included in your inference prompt for automatic segmentation. |
|
|
|
|
|
# References |
|
- Antonelli, M., Reinke, A., Bakas, S. et al. The Medical Segmentation Decathlon. Nat Commun 13, 4128 (2022). https://doi.org/10.1038/s41467-022-30695-9 |
|
|
|
- VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography. arxiv (2024) https://arxiv.org/abs/2406.05285 |
|
|
|
|
|
# License |
|
|
|
## Code License |
|
|
|
This project includes code licensed under the Apache License 2.0. |
|
You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
## Model Weights License |
|
|
|
The model weights included in this project are licensed under the NCLS v1 License. |
|
|
|
Both licenses' full texts have been combined into a single `LICENSE` file. Please refer to this `LICENSE` file for more details about the terms and conditions of both licenses. |
|
|