Papers
arxiv:2604.03619

Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling?

Published on Apr 4
· Submitted by
Peter Yongho Kim
on Apr 8
Authors:
,
,
,
,
,

Abstract

TABLeT uses a 2D natural image autoencoder to tokenize fMRI volumes into compact continuous tokens, enabling efficient long-sequence spatiotemporal modeling with a simple Transformer encoder while maintaining superior performance and computational efficiency.

AI-generated summary

Modeling long-range spatiotemporal dynamics in functional Magnetic Resonance Imaging (fMRI) remains a key challenge due to the high dimensionality of the four-dimensional signals. Prior voxel-based models, although demonstrating excellent performance and interpretation capabilities, are constrained by prohibitive memory demands and thus can only capture limited temporal windows. To address this, we propose TABLeT (Two-dimensionally Autoencoded Brain Latent Transformer), a novel approach that tokenizes fMRI volumes using a pre-trained 2D natural image autoencoder. Each 3D fMRI volume is compressed into a compact set of continuous tokens, enabling long-sequence modeling with a simple Transformer encoder with limited VRAM. Across large-scale benchmarks including the UK-Biobank (UKB), Human Connectome Project (HCP), and ADHD-200 datasets, TABLeT outperforms existing models in multiple tasks, while demonstrating substantial gains in computational and memory efficiency over the state-of-the-art voxel-based method given the same input. Furthermore, we develop a self-supervised masked token modeling approach to pre-train TABLeT, which improves the model's performance for various downstream tasks. Our findings suggest a promising approach for scalable and interpretable spatiotemporal modeling of brain activity. Our code is available at https://github.com/beotborry/TABLeT.

Community

Paper author Paper submitter

We introduce TABLeT, a simple and scalable framework for long-range voxel-wise 4D fMRI dynamics modeling. TABLeT uses a pre-trained 2D natural-image autoencoder to tokenize each 3D fMRI volume into just 27 continuous tokens, enabling much longer temporal context with a lightweight Transformer. Across UKB, HCP, and ADHD-200, it achieves competitive or better performance than prior methods while substantially improving memory and compute efficiency.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.03619
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.03619 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.03619 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.03619 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.