Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling?
Abstract
TABLeT uses a 2D natural image autoencoder to tokenize fMRI volumes into compact continuous tokens, enabling efficient long-sequence spatiotemporal modeling with a simple Transformer encoder while maintaining superior performance and computational efficiency.
Modeling long-range spatiotemporal dynamics in functional Magnetic Resonance Imaging (fMRI) remains a key challenge due to the high dimensionality of the four-dimensional signals. Prior voxel-based models, although demonstrating excellent performance and interpretation capabilities, are constrained by prohibitive memory demands and thus can only capture limited temporal windows. To address this, we propose TABLeT (Two-dimensionally Autoencoded Brain Latent Transformer), a novel approach that tokenizes fMRI volumes using a pre-trained 2D natural image autoencoder. Each 3D fMRI volume is compressed into a compact set of continuous tokens, enabling long-sequence modeling with a simple Transformer encoder with limited VRAM. Across large-scale benchmarks including the UK-Biobank (UKB), Human Connectome Project (HCP), and ADHD-200 datasets, TABLeT outperforms existing models in multiple tasks, while demonstrating substantial gains in computational and memory efficiency over the state-of-the-art voxel-based method given the same input. Furthermore, we develop a self-supervised masked token modeling approach to pre-train TABLeT, which improves the model's performance for various downstream tasks. Our findings suggest a promising approach for scalable and interpretable spatiotemporal modeling of brain activity. Our code is available at https://github.com/beotborry/TABLeT.
Community
We introduce TABLeT, a simple and scalable framework for long-range voxel-wise 4D fMRI dynamics modeling. TABLeT uses a pre-trained 2D natural-image autoencoder to tokenize each 3D fMRI volume into just 27 continuous tokens, enabling much longer temporal context with a lightweight Transformer. Across UKB, HCP, and ADHD-200, it achieves competitive or better performance than prior methods while substantially improving memory and compute efficiency.
Get this paper in your agent:
hf papers read 2604.03619 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper