Reproducing Report Generation Results

by RaphaelStock - opened 5 days ago

Dear COLIPRI Team,

I am currently working on reproducing the report generation results on the CT-RATE dataset using COLIPRI. To ensure my implementation aligns with your methodology, I have a few questions regarding the embedding pipeline and preprocessing steps.

1. Embedding Generation Pipeline

Is the code snippet below the exact procedure used to generate the embeddings that are subsequently fed into the vision projector and LLM?

import torch
preprocessed_images = processor.process_images(image)
preprocessed_images[0]
images_batch = processor.to_images_batch(preprocessed_images)
with torch.no_grad():
    patch_embeddings = model.encode_image(images_batch)
patch_embeddings.shape
with torch.no_grad():
    pooled_embeddings = model.encode_image(images_batch, pool=True, project=True)
pooled_embeddings.shape

2. Preprocessing and Cropping

The current processor handles images of shape 192^3. For CT scans with larger spatial dimensions, does the processor default to a center-crop?

3. Training Techniques & Augmentations

Beyond the methodology described in Appendix C.4, were there additional training techniques or augmentations employed? Specifically:

Were there any explicit cropping strategies used to isolate the lungs?
Were image augmentations done during the Report Generation fine-tuning?

Thank you for your time.

Best regards,

Raphael

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment