MnemoDyn / README.md
vhluong's picture
Update README.md
dc8f588 verified
# MnemoDyn: Learning Resting State Dynamics from 40K fMRI Sequences
[[Paper]]() [[Poster]]() [[Slide]]()
### Sourav Pal, Viet Luong, Hoseok Lee, Tingting Dan, Guorong Wu, Richard Davidson, Won Hwa Kim, Vikas Singh
![MnemoDyn architecture](asset/braine-1.png)
MnemoDyn is an operator-learning foundation model for resting-state fMRI, combining multi-resolution wavelet dynamics with CDE-style temporal modeling.
## Update
MnemoDyn is now published on Hugging Face: https://huggingface.co/vhluong/MnemoDyn
You can also publish your own trained checkpoint directly from this repo.
## Tutorial
A usage walkthrough is available as a Google Colab notebook:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1IeWYPmwZAj5zA_khQHmgKOXjF8DJXVNo?usp=sharing)
## At A Glance
- Pretraining backbones: `coe/light/model/main.py`, `coe/light/model/main_masked_autoencode.py`, `coe/light/model/main_masked_autoencode_jepa.py`, `coe/light/model/main_denoise.py`, `coe/light/model/orion.py`
- Core model modules: `coe/light/model/conv1d_optimize.py`, `coe/light/model/dense_layer.py`, `coe/light/model/ema.py`, `coe/light/model/normalizer.py`
- Downstream tasks: HBN, ADHD200, ADNI, ABIDE, NKIR, UK Biobank, HCP Aging under `coe/light/*.py`
- Launch scripts: `coe/light/script/*.sh`
## Repository Layout
```text
.
β”œβ”€β”€ highdim_req.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ coe/
β”‚ β”œβ”€β”€ parcellation/
β”‚ └── light/
β”‚ β”œβ”€β”€ model/
β”‚ β”œβ”€β”€ script/
β”‚ β”œβ”€β”€ *_dataset.py
β”‚ └── *classification*.py, *regress*.py
└── README.md
```
## Environment Setup
Python 3.10+ is recommended.
### Option A (recommended): uv
```bash
uv venv
source .venv/bin/activate
uv sync
```
### Option B: pip
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r highdim_req.txt
```
Ensure your PyTorch build matches your CUDA stack.
<!-- ## Data Processing Flow
MnemoDyn expects parcellated rs-fMRI time series data (`*.dtseries.nii`) as input.
If you are starting from volumetric NIfTI files (e.g., from fMRIPrep), you must run them through our **Preprocessing Pipeline** (described above) before training. This ensures proper alignment and time-step continuity.
To use custom datasets:
1. Preprocess your NIfTI files through `coe.preprocess.pipeline`.
2. Ensure you have dataset metadata CSV/TSV files (labels, demographics, IDs).
3. Update the hardcoded dataset paths (e.g., `/mnt/sourav/HBN_dtseries/`) in the downstream training launch scripts (`coe/light/script/*.sh`) to point to your new output directories. -->
## Preprocessing Pipeline (NIfTI to Parcellated CIFTI)
We provide a unified, Python-based CLI pipeline to automate mapping volumetric NIfTI images to fs_LR surfaces and parcellating the resulting dense time series. The pipeline dynamically extracts the Repetition Time (TR) from your NIfTI files to ensure downstream models learn accurate temporal dynamics.
### Requirements
- Connectome Workbench (`wb_command`) installed and on your system PATH.
- `nibabel` and `tqdm` Python packages.
### Usage
Run the pipeline from the repository root:
```bash
python -m coe.preprocess.pipeline \
--input-dir /path/to/niftis \
--output-dir /path/to/output_dir \
--atlas /path/to/atlas.dlabel.nii \
--pattern "*_task-rest_space-MNI305_preproc.nii.gz"
```
The script will automatically orchestrate `wb_command` for left/right mapping and resampling, output an intermediate `.dtseries.nii`, and finally parcellate it using the provided atlas, injecting the correct native TR throughout.
## Quick Start
### 1) Inspect pretraining CLIs
```bash
cd coe/light/model
python main.py --help
python main_masked_autoencode.py --help
python main_masked_autoencode_jepa.py --help
python main_denoise.py --help
```
### 2) Pretraining
```bash
bash orion.sh
```
### 3) Run downstream examples
```bash
cd coe/light
bash script/hbn_classification.sh
bash script/adhd_200_diagnose.sh
```
<!-- ## Common Script Entry Points
From `coe/light`:
- `bash script/abide_classifcation_normal.sh`
- `bash script/adhd_200_diagnose.sh`
- `bash script/adhd_200_sex_classification.sh`
- `bash script/adni_classification_amyloid.sh`
- `bash script/adni_classification_sex.sh`
- `bash script/hbn_classification.sh`
- `bash script/hbn_regression.sh`
- `bash script/hcp_aging_450.sh`
- `bash script/hcp_aging_classification.sh`
- `bash script/hcp_aging_regress_flanker.sh`
- `bash script/hcp_aging_regress_neuroticism.sh`
- `bash script/nkir_classification.sh`
- `bash script/ukbiobank_age_regression.sh`
- `bash script/ukbiobank_sex_classification.sh` -->
## Typical Workflow
1. Pretrain a foundation checkpoint (`coe/light/model/main*.py`).
2. Save Lightning checkpoints under a versioned results directory.
3. Fine-tune a downstream head using a task script in `coe/light/`.
4. Track outputs and metrics under `Result/<ExperimentName>/...`.
<!-- ## Publish to Hugging Face
Install Hub client:
```bash
pip install huggingface_hub
```
Log in once:
```bash
huggingface-cli login
```
Publish a training run folder (auto-picks best checkpoint by lowest `val_mae` in filename):
```bash
python -m coe.light.model.publish_to_hf \
--repo-id <your-hf-username>/<model-name> \
--version-dir /path/to/version_17
```
Or publish an explicit checkpoint:
```bash
python -m coe.light.model.publish_to_hf \
--repo-id <your-hf-username>/<model-name> \
--checkpoint /path/to/model.ckpt \
--hparams /path/to/hparams.yaml \
--metrics /path/to/metrics.csv
```
Load it back:
```python
from huggingface_hub import hf_hub_download
from coe.light.model.main import LitORionModelOptimized
ckpt = hf_hub_download(repo_id="<your-hf-username>/<model-name>", filename="model.ckpt")
model = LitORionModelOptimized.load_from_checkpoint(ckpt, map_location="cpu")
model.eval()
``` -->
## Notes and Caveats
- This is a research codebase and is still being consolidated.
- Some scripts may require branch-specific import/path adjustments.
- Normalization and dataset utilities are partially duplicated across modules.
- Reproducibility depends on matching preprocessing, atlas/parcellation, and dataset splits.
## Citation
If this work helps your research, please cite:
```bibtex
@inproceedings{
pal2026mnemodyn,
title={MnemoDyn: Learning Resting State Dynamics from $40$K {FMRI} sequences},
author={Sourav Pal and Viet Luong and Hoseok Lee and Tingting Dan and Guorong Wu and Richard Davidson and Won Hwa Kim and Vikas Singh},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=zexMILcQOV}
}
```
---
license: mit
---