| ---
|
| language:
|
| - en
|
| license: cc-by-nc-sa-4.0
|
| tags:
|
| - neuroimaging
|
| - medical-imaging
|
| - brain-age
|
| - MRI
|
| - tensorflow
|
| - foundation-model
|
| - 3d-cnn
|
| library_name: tensorflow
|
| pipeline_tag: image-feature-extraction
|
| datasets:
|
| - LDM100k
|
| metrics:
|
| - mae
|
| ---
|
|
|
| # NeuroFM
|
|
|
| **Foundation model for individualized brain health estimation from T1w MRI.**
|
|
|
| NeuroFM predicts four brain health markers from a single skull-stripped T1w MRI scan: brain age, total brain volume (GM+WM), lateral ventricle volume, and biological sex. Despite containing fewer than 11 million parameters - orders of magnitude smaller than typical foundation models - NeuroFM achieves state-of-the-art performance across all four targets.
|
|
|
| 📄 [Paper (medRxiv)](https://doi.org/10.64898/2026.03.27.26349489) | 🖥️ [Website](https://rocknroll87q.github.io/NeuroFM/) | 💻 [GitHub](https://github.com/rockNroll87q/NeuroFM) | 🐳 [Docker](https://hub.docker.com/r/rocknroll87q/neurofm) |
|
|
|
| ---
|
|
|
| ## Model variants
|
|
|
| | Variant | Parameters | Latent dim | File |
|
| |---------|-----------|------------|------|
|
| | `neurofm-s` | 484k | 161 | `neurofm-s.h5` |
|
| | `neurofm-m` | 6.5M | 256 | `neurofm-m.h5` |
|
| | `neurofm-l` | 10.8M | 512 | `neurofm-l.h5` |
|
|
|
| SavedModel archives (for finetuning) are also available as `neurofm-{s,m,l}_savedmodel.tar.gz`.
|
|
|
| ---
|
|
|
| ## Quickstart
|
|
|
| ```bash
|
| pip install git+https://github.com/rockNroll87q/NeuroFM.git
|
| ```
|
|
|
| ```python
|
| from neurofm import NeuroFM
|
|
|
| model = NeuroFM(variant="neurofm-s", device="auto")
|
| results = model.predict("subject_T1w.nii.gz")
|
|
|
| # results["brain_health"] → np.ndarray, shape (4,)
|
| # [brain_age, sex, ventricle_volume, brain_volume]
|
| ```
|
|
|
| Weights are downloaded automatically from this repository on first use and
|
| cached to `~/.cache/NeuroFM/`.
|
|
|
| ---
|
|
|
| ## Intended use
|
|
|
| NeuroFM is intended for research use in computational neuroimaging. Predicted
|
| outputs are continuous estimates derived from population-level training data
|
| and are not validated for clinical decision-making.
|
|
|
| **Suitable for:**
|
| - Large-scale cohort studies requiring brain age or volumetric estimates
|
| - Downstream ML tasks using latent brain representations (classification,
|
| regression, clustering)
|
| - Benchmarking against other neuroimaging foundation models
|
|
|
| **Not suitable for:**
|
| - Clinical diagnosis or individual patient assessment
|
| - Populations substantially outside the training age range (40–90 years)
|
| - Data that has not been skull-stripped
|
|
|
| ---
|
|
|
| ## Inputs
|
|
|
| | Parameter | Requirement |
|
| |-----------|-------------|
|
| | Modality | T1-weighted MRI |
|
| | Format | NIfTI (`.nii`, `.nii.gz`) |
|
| | Preprocessing | Skull-stripped |
|
| | Resolution | 1mm isotropic (resampled automatically if needed) |
|
| | Orientation | LIA (reoriented automatically if needed) |
|
|
|
| ---
|
|
|
| ## Outputs
|
|
|
| | Output | Description | Unit | Range |
|
| |--------|-------------|------|-------|
|
| | `brain_age` | Predicted brain age | years | 40–90 |
|
| | `sex` | Predicted biological sex | - | 0.0 (male) – 1.0 (female) |
|
| | `ventricle_volume` | Lateral ventricle volume | mm³ | 0 – 180×10³ |
|
| | `brain_volume` | Total brain volume (GM+WM) | mm³ | 1×10⁶ – 1.9×10⁶ |
|
|
|
| ### Latent features
|
|
|
| Requesting `outputs=["latent"]` extracts a D-dimensional embedding from the
|
| `multihead_output` layer. D varies by variant (161 / 256 / 512). These
|
| embeddings capture a compact representation of brain health and are well-suited
|
| for downstream supervised or unsupervised tasks.
|
|
|
| ---
|
|
|
| ## Architecture
|
|
|
| NeuroFM is a 3D convolutional encoder with a multi-head prediction architecture.
|
| It is **not a transformer** - the FM designation refers to the foundation model
|
| training paradigm (large-scale pretraining on diverse cohorts), not the
|
| architecture family.
|
|
|
| The encoder uses bottleneck residual blocks with strided convolutions for
|
| downsampling, followed by global average pooling, fully-connected neck layers,
|
| and separate prediction heads for each output variable. The multi-head design
|
| allows simultaneous prediction of heterogeneous targets (regression and
|
| classification) from a shared latent representation.
|
|
|
| ---
|
|
|
| ## Training
|
|
|
| ### Data
|
|
|
| | Dataset | N | Age range | Notes |
|
| |---------|---|-----------|-------|
|
| | LDM100k | 100,000 | 40-85 | Synthetic AI-generated T1-weighted dataset |
|
|
|
| **Note:** The LDM100k dataset (Pinaya et al., 2022; Tudosiu et al., 2024) is artificially generated MRI. The generator network was trained on the UK Biobank dataset population, so the output characteristics likely match the demographic composition of the source dataset.
|
|
|
| ### Procedure
|
|
|
| - **Framework:** TensorFlow 2.13
|
| - **Input:** Skull-stripped T1w MRI, 1mm isotropic, LIA orientation, z-score normalised
|
| - **Loss:** Mean squared error (mse) for regression targets; cross-entropy for sex
|
| - **Optimiser:** Adam
|
|
|
| ---
|
|
|
| ## Evaluation (LDM100k held-out set)
|
|
|
| | Metric | neurofm-s | neurofm-m | neurofm-l |
|
| |--------|-----------|-----------|-----------|
|
| | Brain age MAE (years) | 3.42 | 3.22 | 3.43 |
|
| | Brain volume MAE (mm³) | 30168 | 34158 | 35686 |
|
| | Ventricle volume MAE (mm³) | 6326 | 6189 | 7485 |
|
| | Sex F1 score | 0.99 | 0.99 | 0.99 |
|
|
|
| ---
|
|
|
| ## Limitations and bias
|
|
|
| - Scanner and acquisition protocol diversity in the training data is limited
|
| to T1-weighted 3-Tesla emulation from the LDM100k dataset. Generalisation to
|
| substantially different acquisition parameters (e.g. ultra-high field,
|
| non-standard contrasts) is not guaranteed.
|
| - Training data from LDM100k is based on the UK Biobank population and likely represents
|
| similar systematic demographic biases.
|
| - Brain age estimates reflect the population-level relationship between MRI
|
| appearance and chronological age in the training data. They should not be
|
| interpreted as a direct measure of neurological health in individuals.
|
| - Sex classification reflects biological sex as recorded in dataset metadata
|
| and is a binary prediction. It does not reflect gender identity.
|
|
|
| ---
|
|
|
| ## File format notes
|
|
|
| `.h5` files contain model weights only and require the NeuroFM package to
|
| reconstruct the architecture before loading. `.tar.gz` archives contain the
|
| full TensorFlow SavedModel (graph + weights) and are intended for finetuning
|
| or custom loading without the NeuroFM package.
|
|
|
| The NeuroFM package handles `.h5` loading automatically. For SavedModel usage:
|
|
|
| ```bash
|
| tar -xzf neurofm-l_savedmodel.tar.gz
|
| ```
|
|
|
| ```python
|
| import tensorflow as tf
|
| model = tf.keras.models.load_model(
|
| "neurofm-l_savedmodel/",
|
| custom_objects=... # see neurofm/model.py get_custom_objects()
|
| )
|
| ```
|
|
|
| ---
|
|
|
| ## Citation
|
|
|
| If you use NeuroFM in your research, please cite:
|
|
|
| ```bibtex
|
| @article {DibbleNeuroFM2026,
|
| author = {Dibble, Austin and Dalby, Connor and Sevegnani, Michele and Fracasso, Alessio and Lyall, Donald M and Harvey, Monika and Svanera, Michele},
|
| title = {NeuroFM: Toward Precision Neuroimaging with Foundation Models for Individualized Brain Health Estimation},
|
| elocation-id = {2026.03.27.26349489},
|
| year = {2026},
|
| doi = {10.64898/2026.03.27.26349489},
|
| publisher = {Cold Spring Harbor Laboratory Press},
|
| abstract = {Precision neuroimaging aims to deliver individualized assessments of brain health, yet a single structural MRI does not yield a multidimensional, quantitative summary of an individual{\textquoteright}s current health or future risk. Existing approaches optimize task-specific objectives, yielding representations entangled with cohort- or disease-specific signals rather than capturing biologically grounded patterns of anatomical variation. Here, we introduce NeuroFM, a foundation model trained exclusively on 100,000 healthy synthetic volumes to predict morphometric and demographic targets. Without exposure to diagnostic labels, NeuroFM organizes brain MRIs into population-level patterns that encode meaningful brain health differences. These representations transfer across five neuroscience domains without adaptation and support simple linear readouts for clinical, cognitive, developmental, socio-behavioural, and image quality control. Evaluated on 136,361 real volumes spanning multiple cohorts, NeuroFM generalizes across domains and enables individual-level brain health profiling, estimating future dementia risk years before diagnosis. Together, these findings establish a disease-naive foundation model paradigm for precision neuroimaging. Code available at: https://rocknroll87q.github.io/NeuroFM/},
|
| URL = {https://www.medrxiv.org/content/early/2026/03/31/2026.03.27.26349489},
|
| eprint = {https://www.medrxiv.org/content/early/2026/03/31/2026.03.27.26349489.full.pdf},
|
| journal = {medRxiv}
|
| }
|
| ```
|
|
|
| ---
|
|
|
| ## License
|
|
|
| NeuroFM code and model weights are released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). Copyright (c) 2026 Austin Dibble.
|
|
|
| **You are free to**:
|
|
|
| - Use, share, and adapt NeuroFM for non-commercial research and academic purposes
|
| - Extract and use latent representations for downstream research tasks
|
|
|
| **Under the following terms**:
|
|
|
| - Attribution - cite the accompanying paper and link to this repository
|
| - NonCommercial - do not use NeuroFM or its outputs for commercial purposes
|
| - ShareAlike - if you adapt or build upon NeuroFM, distribute your contributions under the same license
|
|
|
| For commercial licensing enquiries, please contact `michele.svanera@glasgow.ac.uk`.
|
|
|
| ---
|
|
|
| ## Acknowledgements
|
|
|
| We acknowledge the MVLS Advanced Research System (MARS) at the University of Glasgow for providing high-performance computing resources and technical support.
|
|
|
| Some data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: [link](http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf). For up-to-date information, see adni.loni.usc.edu.
|
|
|
| Some of the data used in the preparation of this article were obtained from the Neuroimaging in Frontotemporal Dementia (NIFD) dataset, part of the Frontotemporal Lobar Degeneration Neuroimaging Initiative (FTLDNI). Data collection and sharing for this project was funded by the Frontotemporal Lobar Degeneration Neuroimaging Initiative (National Institutes of Health Grant R01 AG032306). The study is coordinated through the University of California, San Francisco, Memory and Aging Center. FTLDNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. For up-to-date information on participation and protocol, see [link](http://memory.ucsf.edu/research/studies/nifd).
|
|
|
| Data were provided [in part] by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.
|
|
|
| Data were provided [in part] by OASIS-3: Longitudinal Multimodal Neuroimaging: (Principal Investigators: T. Benzinger, D. Marcus, J. Morris); NIH P30 AG066444, P50 AG00561, P30 NS09857781, P01 AG026276, P01 AG003991, R01 AG043434, UL1 TR000448, R01 EB009352. AV-45 doses were provided by Avid Radiopharmaceuticals, a wholly owned subsidiary of Eli Lilly.
|
|
|
| Data were [in part] obtained from the IXI dataset ([link](https://brain-development.org/ixi-dataset/)).
|
|
|
| Data were [in part] provided by the 1000 Functional Connectomes Project (FCP). For access and usage information, see [link](https://fcon_1000.projects.nitrc.org).
|
|
|
| This research has been conducted using the UK Biobank Resource under Application 17689.
|
|
|
| Austin Dibble was supported by a PhD grant from the Scottish Graduate School of Social Science, Doctoral Training Partnership (SGSSS-DTP), on behalf of the Economic and Social Research Council (ESRC, grant number: ES/P000681/1).
|
|
|
| Connor Dalby was supported by a PhD grant by the Medical Research Council (MRC) as part of the Precision Medicine Doctoral Training Programme.
|
|
|
| A.F. was supported by a grant from the Biotechnology and Biology Research Council (BBSRC, grant number: BB/S006605/1) and the Bial Foundation (Bial Foundation Grants Programme; Grant id: A-29315, number: 203/2020, grant edition: G-15516). |