Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
Abstract
Brain-IT uses a Brain Interaction Transformer to reconstruct images from fMRI data with high fidelity, surpassing current methods and requiring less training data.
Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present "Brain-IT", a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with a limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i)high-level semantic features which steer the diffusion model toward the correct semantic content of the image; and (ii)low-level structural features which help to initialize the diffusion process with the correct coarse layout of the image. BIT's design enables direct flow of information from brain-voxel clusters to localized image features. Through these principles, our method achieves image reconstructions from fMRI that faithfully reconstruct the seen images, and surpass current SotA approaches both visually and by standard objective metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve results comparable to current methods trained on full 40-hour recordings.
Community
Presents Brain-IT, a Brain-Interaction Transformer that achieves state-of-the-art fMRI-to-image reconstructions, reliably preserving the underlying visual content.
Project page: https://amitzalcher.github.io/Brain-IT/
wow. This is insanely impressive, it captures the semantics so well. Next step is to try diffusion models with stronger semantic priors?
That's a promising direction. Using a diffusion model with stronger priors, paired with effective conditioning signals that can be robustly predicted from fMRI.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- BrainCognizer: Brain Decoding with Human Visual Cognition Simulation for fMRI-to-Image Reconstruction (2025)
- Towards Interpretable Visual Decoding with Attention to Brain Representations (2025)
- Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction (2025)
- BrainMCLIP: Brain Image Decoding with Multi-Layer feature Fusion of CLIP (2025)
- NeuroSwift: A Lightweight Cross-Subject Framework for fMRI Visual Reconstruction of Complex Scenes (2025)
- VoxelFormer: Parameter-Efficient Multi-Subject Visual Decoding from fMRI (2025)
- Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper