Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution
Abstract
Replacing generic VAEs with domain-specific autoencoders in latent diffusion models significantly improves medical image super-resolution quality, with reconstruction fidelity being predictable from autoencoder performance alone.
Latent diffusion models for medical image super-resolution universally inherit variational autoencoders designed for natural photographs. We show that this default choice, not the diffusion architecture, is the dominant constraint on reconstruction quality. In a controlled experiment holding all other pipeline components fixed, replacing the generic Stable Diffusion VAE with MedVAE, a domain-specific autoencoder pretrained on more than 1.6 million medical images, yields +2.91 to +3.29 dB PSNR improvement across knee MRI, brain MRI, and chest X-ray (n = 1,820; Cohen's d = 1.37 to 1.86, all p < 10^{-20}, Wilcoxon signed-rank). Wavelet decomposition localises the advantage to the finest spatial frequency bands encoding anatomically relevant fine structure. Ablations across inference schedules, prediction targets, and generative architectures confirm the gap is stable within plus or minus 0.15 dB, while hallucination rates remain comparable between methods (Cohen's h < 0.02 across all datasets), establishing that reconstruction fidelity and generative hallucination are governed by independent pipeline components. These results provide a practical screening criterion: autoencoder reconstruction quality, measurable without diffusion training, predicts downstream SR performance (R^2 = 0.67), suggesting that domain-specific VAE selection should precede diffusion architecture search. Code and trained model weights are publicly available at https://github.com/sebasmos/latent-sr.
Community
Efficient VAE swap at inference - plug-and-play VAE replacement without retraining the diffusion model at all (zero-shot adaptation)
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MedVAR: Towards Scalable and Efficient Medical Image Generation via Next-scale Autoregressive Prediction (2026)
- Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study (2026)
- Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI (2026)
- RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and Editing (2026)
- MeDUET: Disentangled Unified Pretraining for 3D Medical Image Synthesis and Analysis (2026)
- Brain MR Image Synthesis with Multi-contrast Self-attention GAN (2026)
- Comparative Analysis of 3D Convolutional and 2.5D Slice-Conditioned U-Net Architectures for MRI Super-Resolution via Elucidated Diffusion Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.12152 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper