Reproducing Report Generation Results
#3
by
RaphaelStock - opened
Dear COLIPRI Team,
I am currently working on reproducing the report generation results on the CT-RATE dataset using COLIPRI. To ensure my implementation aligns with your methodology, I have a few questions regarding the embedding pipeline and preprocessing steps.
1. Embedding Generation Pipeline
Is the code snippet below the exact procedure used to generate the embeddings that are subsequently fed into the vision projector and LLM?
import torch
preprocessed_images = processor.process_images(image)
preprocessed_images[0]
images_batch = processor.to_images_batch(preprocessed_images)
with torch.no_grad():
patch_embeddings = model.encode_image(images_batch)
patch_embeddings.shape
with torch.no_grad():
pooled_embeddings = model.encode_image(images_batch, pool=True, project=True)
pooled_embeddings.shape
2. Preprocessing and Cropping
The current processor handles images of shape 192^3. For CT scans with larger spatial dimensions, does the processor default to a center-crop?
3. Training Techniques & Augmentations
Beyond the methodology described in Appendix C.4, were there additional training techniques or augmentations employed? Specifically:
- Were there any explicit cropping strategies used to isolate the lungs?
- Were image augmentations done during the Report Generation fine-tuning?
Thank you for your time.
Best regards,
Raphael